Version control

The extent of this article

Version management is an important aspect of product development. This article was written to educate the audience about the different forms of version management, as well as what the different storage models in use by the relevant tools are.

Also this article will cover some of the common mistakes made, and a quick note on how to prevent this from happening. Now lets get cracking at the main article.

Definition

This you probably have seen many times over: software with a version number. For example WordPress 3.4, Linux 3.2.0 or MySQL 5.1. Now you might have wondered what that means and basically it all comes down to this:

A version is the final result of a development iteration or a snapshot from a given moment.

This definition does have an implication. Once a version has been labelled, nothing can be changed about that version. Sometimes that’s a burden, but there are workarounds for that.

Importance

Imagine the following. You’re in a product development team, and you have to use components from a piece of software, named foo. Everybody gets a copy of foo, and when you’re well into the development phase, all of a sudden you notice that some things are existent in your copy, that do not exist in the copies of others or vice versa. What’s the first thing you do?

If you answered you’d check the version, you gave the correct answer. The most likely cause of the issues are that not everybody is using the same publication of the same product. But what if there is no way to be sure which release you’re using and millions of euros have been invested?

Well, basically you’re screwed and if the scary thought about millions of euros doesn’t bring the message home, I don’t know what will.

How do you keep this from happening? Start keeping track of versions so they comply with the definition above.

Numbering

An interesting question to ask is how to keep track of versions. There’s a number of ways this is done. For example, some projects make a snapshot, call that version X (when its stable). Minor revisions or release candidates are used to turn version X into a stable release, before it is released as being version X.

Another way to go is to keep count of the iterations. The result of sprint 1 is numbered 1, and the same is true for every other iteration. No thought is put into stabilizing each version, as that should have been done in the iteration itself, and if the time provided by the iteration isn’t sufficient, the iteration is simply extended, to actually stabilize the product, before releasing it.

Yet another way is roughly the same as for the snapshot, but instead of a simple number, the release date is used for version numbering. Ubuntu is one of the projects doing this. For example the 12.04 is the latest release, published in April-2012. Release 12.04.1 will be the first patch, to continue into 12.04.2 all the way until support for this release is dropped.

Something that is generally done, besides version numbers is to actually name the release. For example Ubuntu 8.04 was called “Hardy Heron”, Mac uses cat-species names, and Debian uses the names of figures in Toy Story. These names usually refer to the latest release of the same major release.

Maintaining

This is maybe the most difficult part. Not so much to understand, as the whole concept of version keeping is pretty simple, but the difficulty is more in actually doing it, and there are several ways to get from version to version, and to maintain multiple versions at once.

Firstly this chapter will cover how to get from version X to version Y, and secondly we’ll go into maintaining version X while developing version Y (note that this second phase requires some parallel thinking on top of the normal development process).

Active development

This part will probably get the most attention, so we’ll kick of with that. A common way of developing a product is in a project group. This project group will have documents to share such as the blue prints, source code and project management documents.

Most projects organise them selves in such a fashion that these documents are available in a central place. In the case of software development this is called the source repository. Each time a change set is made either the file is copied and then replaced, or if a tool such as SVN or Git is used, the file is just amended, and the source control system takes care of preserving the older states. A third way of doing this is by keeping track of the change sets them selves, with so called patches. An example of which is showed in the figure 1.

Once the product is ready to be assigned a version, the time has come to make a copy of the latest version of the files and put them into a directory or as is done with source control systems, a tag is created (which does basically the same).

Maintaining multiple releases

In most development strategies, the new versions include new features, and improvements on the existing features, while older releases only get bug fixes and security patches. This sometimes presents a little bit of a challenge. For example, when fixing a bug, this should be done in the oldest version first, only to apply all changes to make it work with the newer versions.

Doing it the reverse way, will often work, but might in some cases lead to conflicts, and with tools like Git it is often easier to fix it in the older versions, before bringing it to the newer versions.

As the versions have been tagged, the tags can be copied into a working directory or cache, where they can be used to implement those bug-fixes. This is necessary, in order to preserve the consistency of each release. Remember from the definition, releases should not be changed, instead updates should be published.

This will create a couple of branches of development. For example if in version 7 a bug has been found that is persistent into version 8 and 9, the bug will be fixed in the branch for version 7, and it will then be merged into the branches for 8 and 9 (here is where those tools like SVN and Git start becoming real assets to the development process).

Centralised vs Distributed development

To figure out what this means, we probably should introduce you to a new definition. Lets get cracking at the centralised approach.

In centralised systems the workflow revolves around the latest revision available on the central server.

You shouldn’t have to be a rocket scientist to figure out what that means. Basically it’s the way SVN does its work. People make a copy of the latest version, do their work on top of that, and put it back in the server, applying updates before doing so, if available.

Lets see what the definition for distributed means.

In distributed systems the workflow revolves around the selected version available on the local system.

So basically this is the exact opposite from SVN. There is a local repository available, which can be used to do development, independently from what others are doing, and when the need arises, the work of others is simply merged into the local branches. This can be because others have published an update you need to continue working, or because the time has come to publish a new version.

Centralised strategies are most commonly used in projects where a clear top down approach is used. This is the project leader decides who gets access, and what gets done by whom. This can be done by making less privileged send in patches, which then have to be applied by people with commit access, so that the code can be managed on a line by line basis.

Distributed approaches are generally seen in environments where the project members get much more freedom to do whatever is necessary. The project members set up their own private and public repository, and use a central repository managed by the project leader, in order to get changes downstream. The project leader is in charge of merging the code in from the project members after they’ve sent a pull-request.

Figure 1 – Patch

From b785dcc1644e30d60507ff686202a29a766a0a28 Mon Sep 17 00:00:00 2001
From: Bart Kuivenhoven <bemkuivenhoven@gmail.com>
Date: Mon, 18 Jun 2012 19:13:14 +0200
Subject: [PATCH] mm: pte: test: Removed heap dependencies

The new code uses the stack instead of the heap for string formatting.

Signed-off-by: Bart Kuivenhoven <bemkuivenhoven@gmail.com>
---
 src/mm/paging/pte/pte_init.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/src/mm/paging/pte/pte_init.c b/src/mm/paging/pte/pte_init.c
index 48cade8..78b9451 100644
--- a/src/mm/paging/pte/pte_init.c
+++ b/src/mm/paging/pte/pte_init.c
@@ -130,11 +130,11 @@ x86_pte_set_entry(struct pte_shadow* pte, idx_t idx, void* phys)
                 return -E_NULL_PTR;
         if (idx > PTE_SIZE)
                 return -E_INVALID_ARG;
-
         if (pte->pte == NULL)
                 return -E_NULL_PTR;

         pte->pte->entry[idx].pageIdx = (addr_t)phys >> PTE_OFFSET;
+ pte->pte->entry[idx].present = TRUE;

         return -E_SUCCESS;
 }
@@ -229,6 +229,7 @@ void pte_dump_tree(struct pte_shadow* pte, char* prefix, int depth)
         if (pte == NULL)
                 return;

+ demand_key();
         int i = 0;
         for (; i < PTE_SIZE; i++)
         {
@@ -239,17 +240,12 @@ void pte_dump_tree(struct pte_shadow* pte, char* prefix, int depth)
                 if (depth == PTE_DEEP-1)
                         continue;

- char* pref = kalloc(255);
- if (pref == NULL)
- return;
+ char pref[255];
                 memset(pref, 0, 255);

                 sprintf(pref, "%s%X-", prefix, i);
                 pte_dump_tree(pte->children[i], pref, depth+1);
-
- kfree(pref);
         }
- demand_key();
 }

 int pte_test()

Tools

As mentioned before, there are a couple of tools available to do the work described here. To name a few of the centralised tools:

  • CVS and
  • SVN

and to name a couple of the distributed tools:

  • Mercurial and
  • Git.

Only two most popular have been described on this site, namely SVN and Git.

Common mistakes

A couple of common mistakes you should avoid while working with these version management tools.

  1. Putting confidential data into a public repository. It might seem easy to get this out, but the entire purpose source control systems is to preserve the content, which can make it surprisingly difficult to remove any kind of data.
  2. Making the change sets between commits too large. In centralised systems, whenever there’s an update from upstream, it has to be merged into the local source tree before the local changes can be published to the upstream location. This kind of discourages people to make commits, but it is better to keep the commits as small as possible, as it will keep the chances of merging conflicts as small as possible. Keep the commits small, and commit often, and the problem of merge conflicts will be as easy to deal with as it can get. Another reason to keep the change sets as small as possible is that it makes easier bug finding, by bisecting the history.
  3. Copying the source tree to another location, only to copy the files them selves back into the repository. In theory there is nothing wrong with this approach, but in practice, people tend to first update the local repository, before copying the updated files into the local repository before committing. This results in an absence of merge conflicts, but does revert all the changes made in the updates applied to newly updated files. Always make sure the local changes are present in the local repository before updating, to prevent the undesired roll back of the changes in the updates. An easier way is to just work in the files in the local repository (prevents a lot of frustration and headaches).

Wrapping it up

So a version is the final product of an iteration or a snapshot from a given time. Managing those versions is desirable so there is consistency in the distribution of the product downstream.

Numbers can be assigned in a couple of ways, the most popular being the iteration and snapshot count, code names are a nice asset for making talking about a specific version a bit easier.

Development requires all change sets to be preserved, which can be done using source control systems, and maintaining multiple versions should be done by using maintenance branches.

Centralised systems are used in top-down environments, whereas distributed is used in peer-to-peer environments.

For further reading, I’d suggest the following wikipedia pages:

http://en.wikipedia.org/wiki/Versioning

http://en.wikipedia.org/wiki/Version_control

http://en.wikipedia.org/wiki/Release_management

So that’s it, and don’t forget to be awesome!

One thought on “Version control

  1. Well written article. @avans this needs more attention. Be awesome too!

Comments are closed.