# Maximizing expressive power in Evolvix

### Why does expressive power matter?

We develop the Evolvix model description language, because we want to make it easy for biologists to describe the biological systems they study in mathematically rigorous form.

The** expressive power of Evolvix** determines, how many different types of models can be modeled in Evolvix with reasonable effort.

Generally, Evolvix can model systems that range from molecules in cells to individuals in ecosystems, as long as the underlying mathematical model used by Evolvix is an appropriate choice for the system that is being modeled. Whether the math matches the model a biologist aims to investigate, depends on details that define the precise nature of the abstraction used by the model. Currently Evolvix supports simulating models known as continuous time Markov chain models - under the condition that all Parts of these models can be written down before the start of a simulation.

A diverse number of complex cutting edge simulation models in systems biology, ecology, and population genetics do not fall into this category and require more flexible simulation capabilities. Increasing the expressivity of Evolvix can enable the simulation of such models.

### How can we increase the expressive power of Evolvix?

Thorough analyses of various modeling scenarios over the last few months have provided ample evidence for the following conclusion: General programming capabilities are required for efficiently modeling a broad array of biological questions.

In computer science, 'general programming capabilities' are often associated with and analyzed by abstract models of so-called 'Turing Machines'. Another way to describe what Turing Machines enable is to say that they allow the construction and analysis of recursively enumerable sets (i.e. sets of things that can contain sets of other things with even more sets of still other things). This

- hierarchy of sets in computer science is easily mapped to various aspects of
- hierarchically organized systems in biology, where
- molecules are found in ... cells,
- cells live in ... tissues,
- tissues exist in ... individuals,
- individuals live in ... ecosystems.

If Evolvix will ever allow the analysis of such hierarchical systems, then two fundamentally different approaches can be used to implement such capabilities:

**Description of fixed systems**: Extending the current design of Evolvix, we could fix some syntax and data structures for enabling the description of some details of some hierarchical models that Evolvix will then be able to model. This approach is very cumbersome for many reasons. For example, it requires drawing an arbitrary line between model details that are named and fixed by Evolvix itself and those that can be changed by users. Defining such model details as features of Evolvix would require a corresponding parser and simulator to be designed and implemented (currently in C/C++ and a LL* grammar in ANTLR). Drawing the arbitrary line is easy for a given modeling question, but hard to generalize. Adding*one*new Evolvix parser and simulator is cumbersome, but doable. However, the lack of generalization would require implementing a long series of such hierarchical models if we aim to cover reasonably large areas of biology.

This would require a prohibitive amount of programming, leading to a rather confusing set of partially overlapping models; reducing the complexity of this modeling network is hard and if not done upfront, will result in a random collection of models that are too specific for reuse in addressing broader modeling questions.

As a result, much of the simplicity of Evolvix would be lost.**Generative systems:**Instead of describing some biological details of some levels from selected hierarchical systems, Evolvix could provide general capabilities for describing how to generate the recursively enumerable collections of biological details of interest for biological models. Adding this small amount of work (compared to the task of constructing the same number of models as described above) has huge advantages:- No need to generalize biology if we want to avoid unnecessary complexity in the core Evolvix language.
- Simplifying the construction of such models substantially increases the number of biologists who might engage in the process of constructing such models (and reduce the amount of work we would have to invest for constructing any particular model).
- The same system can also be used to manage the large number of simulation results necessary for analyzing models and estimating parameters (when designed from the start to include this purpose).
- The general programming capabilities necessary to enable the features above can also be used to automate the construction of systematically varying models and to compose arbitrary workflows for model analysis, two problems that were not addressed in the last design of Evolvix (version 0.2).

Thus we decided to add general programming language capabilities to Evolvix.

### What does it mean to turn Evolvix into a general programming language?

Most importantly, this does not complicate using the simple descriptive core of Evolvix as presented on this website. It merely means that in the future, Evolvix will add language constructs that enable these general programming features and that some of the current features can move into an "Evolvix Standard Library" that is always there to use out of the box.

Combining general programming language features with features that are dedicated to modeling questions makes it much easier to investigate the broadest possible range of biological questions than constructing such models one by one without this support. This special built-in support will help with

- modeling of systems in biochemistry, genetics and beyond,
- documentation of models,
- catching modeling errors,
- parameter estimation,
- distributed computing, and
- managing simulation results that can easily reach 'big data' proportions.

To keep all this accessible for biologists, we keep investing substantially into simplifying the overall design of Evolvix and into keeping the learning curve as shallow as possible.

### When will this happen?

We have developed syntax designs that show the potential to resolve key challenges for modeling in biology and are currently combining them into a redesigned foundation for Evolvix. This foundation will maximize the expressivity of Evolvix by selecting relevant semantics, syntax and internal data structures to prepare for a growing body of functionality. At the same time we are expanding the modeling relevant capabilities of Evolvix that enable observing TimeSeries related features, help estimate parameters and more. These will then be linked into the redesigned foundation. We plan for this process to lead to new capabilities over this Summer and thus bring us much closer to our aim for Evolvix, namely, to make it easy for biologists to describe the biological systems they study in mathematically rigorous form. We are designing Evolvix with the goal of supporting backwards compatibility with the start of Evolvix Version 1.0 to make Evolvix a suitable format for storing executable descriptions of simulation models as online supplements of journal articles that present the scientific results from such models.

Currently we focus on the core design; we do not discuss every turn of our thoughts on the website in the interest of reducing the time until when the new capabilities become available.