Skip to content. | Skip to navigation

Personal tools

Sections
You are here: Home / Developer / Repositories

Why Version Control?

How repositories help control version chaos.

 

Much of what we do in our research can be seen as a struggle to improve older versions of files to produce newer versions until we are satisfied with their quality and release them in their final form as research papers or source code for a new release of our software. To help in the process it is often helpful to have access to older versions for comparisons. 

A "Version Control System" (VCS) is simply a way to keep track of files, and all the changes that have ever been made to them. Such systems were originally designed for use with source code files, and other related data from computer programming projects. They come from the software industry, and are used in most software projects of any non-trivial size. It turns out that they come with features that are very helpful for writing research papers as well. Have you ever been tempted to create a document called

Report-Final.doc

only to supplant it by -Final2.doc, -ReallyFinal.doc, -MyFinalFinal.doc? If you recognize the situation, a version control system could help you - if it actually work for you.

 

How is Version Control handled in Evolvix?

Briefly, Evolvix uses Git for version control of (almost) everything that needs version control. Programmers who are not familiar with the idea of a Distributed Version Control System will need some time to learn about Git Concepts and how it works, at a conceptual level.

Git is a powerful Version Control System that has been supporting huge software projects between many developers and provides an easily accessible history of changes. It helps with management and bug tracking. It is also known as a "Distributed Version Control System" (DVCS)  and is very flexible. This makes it ideal for open-source projects, which may have a large number of developers all over the world, who contribute small amounts of code at sporadic times.

With Git's power comes great responsibility. Git expects its users to be 'grown-up', which translates into 'Git-magician' for the rest of us (and often includes professional programmers, in case this is any consolation). In fact, Git provides so much flexibility that it is rather easy for non-experts to generate complications, which can easily tie them up in knots and turn Git from most powerful dream-tool for keeping things in order into a generator for the worst nightmares (for which visiting a 'Git-magician' is the only known cure that does not end in the loss of data). Given these large oscillations and dangerous extremes, why is Git still on the list for inclusion into Evolvix, a language with a strong focus on maximizing stability and increasing user-friendliness?

Reason: While using Git for a number of years in very diverse contexts in the Loewe Lab at the Wisconsin Institute for Discovery, we discovered some patterns and approaches that provide us with a few blue-prints and strategies for how to use Git and for how to not use it, in order to simplify and minimize frustration. It is beyond our scope to get into more details here, except to say that we pick the safe and useful parts, while hiding dangerous ones.

Strategy: We do this by constructing a simplified layer of access to the version control services provided by Git and optimize it for biological modeling, data analysis, and other key bio use-cases. Our preliminary investigations have shown that non-computing biologists can in principle use much more of Git than they might expect initially, if they are given a model that illustrates how to use Git. When following such a model, we found that Git has much to offer to biologists, especially when handling data or managing models. The key is to highlight what Git does well, and to hide the detours and shortcuts that lead to danger. We are currently developing a specific user-friendly model for how to organize version control for modeling purposes in Git. and for testing how user friendly it is. Details will follow at a later point.

In the mean time, if you want someone else's take on the need for Git version control, see the talk "A brief introduction to Git & GitHub" by Prof. Karl Broman who runs a data intensive research group at the University of Wisconsin-Madison. If you are a programmer and want to contribute code to Evolvix, then read on. We will explain here how to get you started with access to the relevant repositories.

 

Where is all the Versioned Data Stored?

Git receives all data to be versioned from its users as a sequence of incremental "commits" they wish to store. Users commit when they made sufficiently many changes they would like to keep as one unit. The full sequence of commits defines the most recent state of a folder by successively applying the commits to an empty folder. If a project needs different types of variants that are best defined by complicated branching patterns, Git can easily store such data very efficiently. The resulting data structures are highly compressed and represent a "commit-tree" that will grow to become as complicated as needed while tracking the evolution of all stored variants produced over the life-time of a software project. 

Git stores all this data for a project in a so-called information "repository", or brief, "repo". Each repo is organized around a given local folder in the file system on your hard drive. This folder can be thought of as the "RepositoryHomeFolder" (brief: RHF), since it provides a fixed point of reference for the repository and acts as the stage from where Git captures the latest version when asked to store the latest changes. This folder also provides the stage to which Git will restore all the files of an old version requested by a Git user. And this is where we already meet our first complication: regularly capturing versioned snapshots of variants of a given project is great. The ability to restore any of them at will is even greater. However, users who do not always carefully commit their latest changes to Git for safe-keeping, can easily lose data, when their latest changes are overwritten. This can easily happen, if they hastily switch to an earlier variant, but forgot to commit their latest changes. RepositoryHomeFolder is also a special folder in another respect: it stores the invisible sub-folder named ".git". Here the leading "." (dot) tells most operating and file systems that the corresponding file or folder ought to be hidden from normal views. 

Git has a command-line interface (CLI).... The Git CLI was originally developed for power-users and continues to be extremely popular and efficient for those who can memorize the commands they need. However, CLIs are often experienced as very cryptic and unfriendly to newcomers. If you feel comfortable with this, you are welcome to use Git without a graphical user interface.

... which is how we use Git now: Since we need Git for our development work now, we cannot wait until we have the new interface that we are developing for git. In the links collection we provide below contains some elementary instructions for how to set up Git, gain access to our development repositories, and other related developer details. These instructions should be complete, but will be brief and are meant for programmers who are already used to CLIs.

Git use through a graphical user interface (GUI). The use of Git is somewhat simplified by front-end programs that wrap Git's cryptic commands in a GUI that requires users merely to remember where the button is they need to push. Unfortunately, such eye-candy does not simplify the semantics of Git, and therefore does not prevent users from the nightmares mentioned above. The links below contain instructions for helping to set-up a program called "SourceTree", which is one of several excellent Git GUI options. 

Git and its future role in Evolvix. Drawing on our experience and many painful lessons learned, we have been working towards developing a semantics for Git that provides an integrated distributed versioning development workflow. Our aim with this work is to make versioning accessible for experimental biologists without prior computing experience. This essentially requires that we make everything disappear that even remotely looks or feels like Git. We're not there yet, but stay tuned.

 

How to work with Git and Gitolite in the Context of Evolvix

The following links can help you to get started with version control, Git, and Gitolite, a nifty program that makes it possible to create authenticated groups of Git users. These links are incomplete and the list evolves as we are investigating how to best apply the power of Git in safer ways in the context of Evolvix. Git is remarkably platform independent and mostly works without problems on Linux, Mac, and Windows, as well as across these systems. We will point out the few platform-specific quirks that come up occasionally and how to work around them as needed. For random reasons, much of these texts have been written from the perspective of MacOSX developers; we try to update for other systems too, so let us know if something does not work as expected. Please bear in mind, this list is for developers and not for end-users:

 

Git links of broader interest for experts: