Austin, USA, 20-21 July 2012.
By Aleksandra Pawlik, PhD student, the Open University.
Hands-on tutorials aimed to familiarize the participants with the Madagascar features
Generating LaTeX documents directly from Madagascar – enabling scientists to conduct reproducible computational experiments
The discussion about the best ways to increase engagement of the research community in the development of Madagascar
Software-supported reproducible research in geophysics Madagascar (http://www.reproducibility.org) is a specialized open-source software package for geophysicists. Its development started in 2003 and the first version was released three years later. Madagascar is developed by a constantly growing community consisting mainly of geophysicists. The project was originally started by an international group of geophysicists, of whom some met during their earlier work at the Stanford University. Madagascar not only provides a suite of tools for geophysics but it also enables scientists to make their research reproducible thanks to a range of functionalities which support archiving and documenting all source code and data used in computational experiments.
The two day (21-22 July, 2012) Madagascar School took place at the Bureau of Economic Geology in the University of Austin, Texas. The workshop aimed to introduce new users to the software suite and improve the skills of the researchers who already had some experience in using it. Over 40 participants from 11 universities and 4 companies could learn how to use the existing Madagascar modules and how to develop and contribute new source code.
Madagascar software suite can be installed and run on three platforms: Windows, Linux and MacOS. The package uses RSF (Regularly Sampled Format) which is an open-exchange file format. RFS design makes the *.rsf files flexible, transparent and easy to manipulate and process by more universal tools. This means that, if needed, programs other than those in the Madagascar suite may handle RFS files. Users are also provided with the scripting utilities in Python and SCons which facilitate building scripts for complex data processing. Madagascar has another, rather unique feature – it supports generating documents using LaTeX. The input generated by the software suite for the LaTeX documents is organised in a way which allows researchers quickly and easily reproduce the computational results presented in the publication. If needed, all images, data and code used in the computations and then fed into the paper can be accessed and reused. During the workshops in Austin the participants could try out a variety of Madagascar features using the examples available to download from the software website. One of the exercises was creating a paper in LaTeX using the mentioned feature.
Over half of the second day of the workshops focused on developing new modules for Madagascar. The software source code is kept in a SVN repository to which access is not restricted. Basically anyone can commit new code. This was a conscious decision of the package managing team who wanted in this way encourage scientists to contribute to the development. It may seem that the open access to the repository may involve some risks. And indeed, it happened in the past that the Madagascar code got broken because of someone’s commit. However, using SVN the Madagascar’s maintainers were able to revert the software back to the working version. It should be noted that the code was never intentionally broken or vandalised. The researchers who committed the problematic code did not expect it would cause any issues.
The code is hosted in Sourceforge where bugtrackers are also available. The developers can communicate via the mailing list. Since Madagascar provides a number of APIs for different programming languages (from C and Fortran to Java and Python), the new package contributions can be written in more than one language. The main requirements which the new modules have to meet are: error handling and parameter checking, accepting command line arguments to control the parameters in the program, compliance with GNUGPL license and having well defined and limited scope (to leverage the advantages of SCons and Python scripting). The project uses semi-automated regression testing. However, as the software and the generic libraries evolve updating the tests poses a challenge.
Apart from committing the new modules to the repository, the members of the Madagascar community are encouraged to contribute their documents (in LaTeX) related to the software. These could be scientific publications produced using Madagascar, experiments’ documentation and any other relevant papers.
Madagascar is developed under GNUGPL license for two reasons. One is giving the scientific community free access to the advanced software package. Another reason is to protect the code from being extended and then commercialized by an external organisation.
The Madagascar School event finished with a discussion about the future of the package development and the expanding the user and developer community. The project has a website http://www.reproducibility.org which provides the information about the software, user and developer documentation as well as other materials. There is a ‘Madagascar development’ blog http://www.reproducibility.org/rsflog which records information about new programs added to the suite, advertises and reports back from related events, announces new releases and provides more general information about scientific software development. The blog is mainly run by Segey Fomel. However, other community members are actively encouraged to author blog posts. On the top of all thas, Madagascar has its own LinkedIn? project profile which now connects over 200 researchers from all over the world.
The Madagascar project is more than just a comprehensive suite for a scientific domain. It is a good example of what practices are useful in scientific software development. It also shows the challenges which scientists-developers face and the ways these issues can be addressed.