By Mike Jackson, Software Architect.
Michael Chappell leads the Quantitative Biomedical Inference (QuBIc) research group within the Institute of Biomedical Engineering at the University of Oxford. Michael has developed a method of processing functional magnetic resonance image (MRI) data that can be used to recognise blood flow patterns in the brain. I have been helping Michael through one of our consultancy projects, which he applied for through our open call. Part of our collaboration looked at issues around integrating Subversion or Git repositories with CVS.
QuBIc's method is implemented as part of a C++ code, FABBER, which can be used on its own or via BASIL (Bayesian Inference for Arterial Spin Labelling MRI), a shell-script that provides a richer command-line interface. Both FABBER and BASIL are distributed as part of FSL, the FMRIB Software Library, which is produced by The Oxford Centre for Functional MRI of the Brain, Nuffield Department of Clinical Neurosciences, University of Oxford and the John Radcliffe Hospital.
Michael has write access to CVS and maintains FABBER and BASIL, with their source codes held within FSL's private CVS repository. Researchers in Michael's group either use FABBER and BASIL as shipped within FSL releases, or are given a more recent and stable version by Michael. Any code developed by these researchers, specifically models to be added to FABBER's model suite, or any other enhancements to it or BASIL, must be added to CVS by Michael. The researchers typically develop their extensions without using any form of revision control.
We wondered whether there was a way for FABBER and BASIL to be held within a repository which both Michael and his researchers would have write access to, and which would still allow the two codes to co-exist in FSL's CVS repository. Ideally, any solution would both encourage researchers to use revision control, but also reduce the time for Michael to commit their changes into CVS. The starting point was to assume that QuBIc could host BASIL and FABBER in a Subversion or Git repository, and use branches to manage researcher-specific modifications.
Populating a repository
Given a Subversion or Git repository, how would this be populated with FABBER and BASIL from CVS? One option is to just convert the CVS repository to Subversion or Git. According to Subversion, the most popular and mature tool for this is cvs2svn whose design aims are robustness and 100% data preservation. cvs2svn also handles conversion to Git. Similarly, there is a git-cvsimport command for certain Linux flavours. However, these commands convert the whole CVS repository into Subversion or Git.
As an alternative, stable copies of the contents of the FABBER and BASIL directories within CVS could be taken and used as a "snapshot" to populate a new Subversion or Git repository.
Integration with CVS
Once the repository has been populated, how would this integrate with CVS? There are a number of options.
The simplest, though most time-consuming, option is for Michael to copy and commit stable versions of FABBER and BASIL into CVS as and when required. This is what he does at present, though at least his researchers would now be using revision control also.
A shell script could be written that automates the above, copying code into CVS on a regular basis e.g. every minute, hour or day. At its simplest, this could just traverse the files in the repository and compare them to the corresponding files in CVS to determine which files have been added or removed, then just copy across the changed files and invoke the appropriate CVS commands. One challenge would be to detect any binary files, which need special handling when added to CVS.
Subversion provides a range of commands that would help in such a script. These include svnlook changed which provides information on which files were changed, added, removed or moved or renamed at a specific revision, svnlook dirs-changed which provides a list of those directories which were changed, or whose files were changed and svn propget, svnlook propget and svnlook proplist, all of which provide information on file properties (an svn:mime-type property can be used to determine if a file is a binary).
Likewise, Git's git diff command shows the difference between revisions. In particular, its --numstat flag shows a list of files changed along with lines added or removed with a special token being used for binary files.
Subversion and Git both support post-commit hooks which are run whenever a change to the repository is committed. The hook is a script that is passed the repository path and the revision number created by the commit. The script can then carry out some action e.g. email repository users, trigger a back-up or trigger the shell script suggested earlier.
In Subversion, the svnlook propget command allows a commit message to be extracted for a specific revision in Subversion so this could be used to ensure that the CVS commit message is in synch with that used within Subversion.
Exporting Git commits
git-cvsexportcommit is a command supported by certain Linux flavours. This compares the state of a Git repository to a checked-out CVS repository, creates a diff between the two, and then commits this to CVS. It can support file addition and deletion and binary files. I tested this by setting up a mock CVS repository with the contents of FSL, and setting up a Git repository with source code directories corresponding to those holding BASIL and FABBER source code within my CVS repository. Experimenting with committing changes to Git then exporting these to CVS seemed to work well. However, there were occasional synchronisation issues that arose if I forgot to export a specific commit from Git to CVS.
The use of branches or tags within CVS may complicate solutions to any of the above options. Similarly, it is unclear whether changes are made to FABBER or BASIL in CVS by other developers e.g. when preparing for an FSL release.
If either or both of these hold then there would need to be a process to ensure any Git or Subversion repository is kept in synch with changes to FABBER and BASIL within CVS.
There seemed to be no straightforward way to allow FABBER and BASIL development to be done within Subversion or Git while FSL development continues in CVS. There is another alternative which is to migrate FSL itself to Subversion or Git. While there are myriad articles online promoting the benefits of Subversion and Git over CVS, there would the requirement for FSL to invest time in porting their repository to Subversion or Git and learning about a new version control tool. Whether this investment in time and effort is justified, given FSL has a set of processes that have been used for many years, would be a question that only FSL could answer.
If you've experience in integrating Git or Subversion with CVS, please feel free to share your hints and tips with us.