Skip to content. | Skip to navigation

Personal tools

Sections
You are here: Home / About / Evolvix System design overview

Evolvix System design overview

A very high-level overview of the design of Evolvix System and how it links to Evolution@home.

Evolvix is two things:

  • a language specification and 
  • a system implementing that specification and the simulation capabilities it requires.  

The Evolvix System is being built to enable many different capabilities. Here is a very high-level view of different building blocks that are being integrated into the Evolvix System.

 

Evolvix: a unifying language

Evolvix aims to provide a unifying syntax for describing models in disparate disciplines and on different levels. To analyze a model in Evolvix, you need to identify

  • the various Parts of the model that you want to be simulated, along with all relevant Actions and their dynamics;
  • your queries about your model and the tasks you want to be performed in order to evaluate your model. 

All this defines a so-called Evolvix Quest and is written up in the syntax of the Evolvix language (see tutorials for instructions). The Evolvix System transforms a Quest into representations appropriate for the corresponding method of analysis that will produce the requested results. 

 

Highest level overview of Evolvix.

 

 

 

Evolvix and Evolution@home 

What happens before and after running simulations as described above? A single simulation often has limited value; in many cases a set of related simulations is needed to answer a question. Many biological questions require large numbers of simulations, which in turn may need distributed computation and the following:

  • schedule sets of simulations and set priorities;
  • distribute the tasks to available compute resources such that usage is optimal;
  • collect and store results such that unnecessary re-computation is avoided and high-level results are easy to produce;
  • summarize results such that they are easy to interpret appropriately.
The ease of use of a simulation environment is decided to a large part by how much of the tasks above can be automated without unnecessarily restricting the options for analyzing a model. To design this functionality, the experiences with running Evolution@home so far have proven to provide critical insights.

Evolution@home is a running globally distributed computing system that enables anybody with a computer on the Internet to donate computing time toward solving computationally hard problems in evolutionary biology. It has been running continuously since 2001, when it started with population genetics simulations of mutation accumulation in asexual populations and produced a number of published analyses on "Muller's ratchet", a phenomenon that is important for understanding why asexual populations might go extinct (see here).

Evolution@home is currently being completely rewritten from scratch in order to integrate it with Evolvix. The experiences from running Evolution@home so far are pivotal for important design decisions for integrating distributed computing capabilities into Evolvix. The picture below is a high-level overview of the various areas of functionality needed. We aim to find the right abstractions for the various tasks so that this system can run on a broad range of compute resources from laptops to data centers.

 

High level overview of the redesigned Evolution@home global computing system and how it integrates with Evolvix.

 

Evolvix portal: handles all user facing tasks, such as

  • upload, manage and edit all Evolvix source code;
  • set time, space and bandwidth limits for all available computing, storage and network resources;
  • set priorities if requested tasks cannot be completed within the given limits;
  • check progress of longer running computations and change scheduling priorities as needed;
  • present high-level visual overviews for quickly inspecting key results (and intermediate results for long-running computations);
  • allow the extraction of data from stored results of past computations;
  • manage, delete, duplicate, move etc stored simulation results and check integrity as necessary;
  • report statistics of how much compute resources where needed for a given project.
Resource Coordinator: coordinates execution of all major tasks passed on from the Evolvix Portal. The Resource Coordinator combines a workflow solution with a distributed computing scheduling platform and a manager of large distributed storage resources. 

Distributed computing capabilities are needed whenever a simulation problem is bigger than the combination of a researcher's patience and a laptop. It can be very difficult to predict surges in computing demands, as they are often triggered by seemingly minor changes in some parameters of a model. Biology somehow does not respect our computing time limitations. That is why mechanisms for managing the limits of compute resources are critically important. 

Evolvix Workers implement the diverse analyses enabled by Evolvix in the picture above. Evolvix Workers can be distributed to the computing resources that are available for the corresponding tasks. The rewritten Evolution@home system will be one of several computing environments supported by Evolvix. The aim is to include supercomputers for particularly demanding tasks that do not run well on smaller computers (e.g. from volunteer participants of evolution@home). In turn, many smaller computers are needed in order to free up the extremely limited supercomputer resources for projects that really need them.

Evolvix Storage remembers results that took Workers a long time to produce in order to reduce the need for re-computation. The vision is to develop an interface for a system that abstracts away the storage of results files such that storage backends become easy to exchange. This will make it easier to support a broad range of storage options that are easy to scale beyond local file systems. Most of this is hidden to users, but enables something very important for biology: asking questions that require more space than a local file system can offer.

 

The vision laid out above will take time to implement. After implementing and developing the first prototype for some time we have been collecting and analyzing our experiences. This resulted in redesigning our whole development process from the ground up in order to greatly improve efficiency. A brief overview of the new development process can be found here