Thinking about migrating to a different repository?
Taverna is an internationally successful workflow environment developed by the myGrid team at the University of Manchester. The team recently moved the Taverna source code from SourceForge to Google Code and GitHub. We asked Shoaib Sufi – myGrid project manager and community liaison for the Software Sustainability Institute – to explain their thinking behind the move. The work behind this post was completed by Stian Soiland-Reyes, Jiten Bhagat, Stuart Owen, Alan Williams and Shoaib Sufi.
Taverna was moved from SourceForge to Google Code due to the ad-ridden downloads on SourceForge. We felt that the adverts could cause user confusion, so we chose to move to a new repository.
At the time of writing Google Code support Subversion (SVN) and Mercurial (HG) repositories. We used a tool called cvs2svn against a CVS dump from our SourceForge repository. This was necessary to preserve the full CVS history. When performing the CVS dump, it’s important that the fresh Google Code SVN is at revision 0/1. We requested a reset on the Google Code forums, but before we did this, we made a tag in CVS so that we could confirm that there were no further commits after we made the move. If you wish to move from SVN to Mercurial, there are a number of tools available, such as svn2hg.
Google Code will not allow someone to register a project name that has already been used on SourceForge (Taverna, in our case), unless that person is the owner of the SourceForge project too. This meant that we had to send a support request to Google Code, showing that we owned the SourceForge Taverna project, before we could register Taverna in the new repository.
Although the code for Taverna is stored on Google Code, we moved the actual download files to lanchpad.net. Launchpad.net is run by the people behind Ubuntu Linux. It provides better download speeds and reliability than Google Code, and has the added benefit of hosting Bazaar – which is a distributed version control system similar to Git and Mercurial.
Another project that we are involved in, called Sysmo, uses Mercurial on Google Code. However, if we were to start that project again, we’d use GitHub due to its better branching and merging, greater speed and the ability to login using certificates. RightField and associated tooling (some of the non-web products coming out of Sysmo) are hosted on Github. Relative to Mercurial, Git has a greater learning curve, especially for those not familiar with distributed version control. In other words, Mercurial is an easier stepping stone from Subversion.
We are using Github for the experimental work on the new Taverna workflow format SCUFL2. This is occasionally back-synced to our Google Code SVN repository. At the time of writing, GitHub’s support focus is on Git repositories. It also has some Subversion support and can support Git and SVN access to the same repository, although this can cause a slightly idiosyncratic project history.
GitHub is a much more social than other repositories, in that you can follow people and watch projects. Some of our developers follow the commits of certain projects through their feeds. It is very easy to fork projects on GitHub and request back pull merges. This creates a much more vibrant community where people can take other people's code, enhance it, use it and collaborate by pushing back the changes they have made. The Ruby community practically lives on Github, which is important to us because 50% of myGrid development is Java and 50% is Ruby/Ruby on Rails.
Github supports messaging between users, something that Google Code does not do. It also allows you to annotate (by adding notes) individual lines within commits and discuss commits. This makes code reviews far easier, especially for distributed teams.
Google Code has less of a social element to it, but gets the job done and seems to have better management of things like wiki pages, issues/tasks list, and suchlike. Google Code does not support Git at the present time (although there are rumours support will be added, because the Android team are keen users of Git). Some developers at myGrid find Git more intuitive than Mercurial - but opinions vary.
Is GitHub better than other systems?
Why is Git better than other systems? This post provides some answers. It gives GitHub as one of the reasons why Git is better.
There is a lot of support for working with GitHub repositories locally (in the form of a dedicated command line tool).
Github supports private accounts with many different plans (i.e. GitHub can be used for not-free/closed-source projects), whereas Google code does not.
GitHub recently introduced the concept of an organisation. This is very useful for groups of people working together. One can easily switch between private and organisation contexts. Google code does not support this feature.
GitHub doesn’t currently suffer from big-company syndrome, which leads to a more friendly and supportive environment. There is some anecdotal evidence of Google being a lot less responsive.
GitHub is very much on the pulse of what it takes to be the next-generation, social-code repository. They are constantly adding new features and improving. Google Code gets the job done and is more flexible in certain areas, but is limited when it comes to social elements. These social elements frequently meet the needs of a distributed team.
Github’s new features, and other items of interest, can be gleaned from their blog.
A few things to remember
The need to be consistent should be borne in mind and procedures. For Example a rule that Java -> googlecode svn, and Ruby -> Git is OK, or you may decide to split by project.
Check that your bug tracking system can link to the new source code control. Similarly, you need to have your build system linked up to the code repository.
In the case of Taverna and other myGrid projects, the bugs in Jira are annotated by Jira with the SVN commits that fix them.
If you are planning on changing your hosting provider, keep in mind that this is also a good time to move to a different distributed version control system. When it comes to distributed version control systems, we recommend Git in the first instance, although we’ve also had good experience with Mercurial.