The invaluable contribution of open-source software to research

Posted by s.hettrick on 17 May 2013 - 9:01am

By Mark Woodbridge, software developer, Bioinformatics Support Service, Imperial College London.

This is the first article in a new series called a day in the software life. In this series, we will be asking researchers from all disciplines to discuss the tools that make their research possible.

I'm a research software engineer developing tools to help life scientists organise, analyse and share their data. This is a varied and rewarding role but with its challenges - not least keeping up to date with both technology and the relevant research whilst trying to build robust, usable, self-sustaining software and getting credit for doing so. However, one thing that makes it enjoyable is being part of a wider development community. The innumerable open source software projects that enable or accelerate development of the tools that we build for others are an invaluable resource and were the basis of a recent discussion at Collaborations Workshop 2013.

Obviously the availability of a plethora of freely-available software tools, libraries and frameworks is a great advantage when you work in a non-commercial environment. Even disregarding this benefit, the choice to use open source improves the reproducibility of your research, makes it more accessible to others, and, as long as you contribute to (and therefore sustain) the projects you depend on, can enrich the scientific commons. But I am consistently impressed not only by the quality of the available resources but also by the helpfulness of the respective communities, whether by email contact, via platforms such as BioStar (which is itself open source), or even in person. This level of support is underappreciated not only by many large corporations but also by funding bodies who have yet to rigorously assess the value and impact of software that originates from research.

I rely on free software on a daily basis to do my job. I develop on Linux, primarily using Eclipse. This is a matter of choice (and habit), but reflects the fact that most of my code is in Java, partly because I depend on lots of special-purpose Java libraries, but also because we still develop cross-platform client and server software, and, more recently, for Android . We also use R/Bioconductor, Python and other languages and frameworks whenever they are the best tool for the job. Most of my new software is intended to run in a browser and is written using the Play Framework. There are some fantastic tools that have greatly simplified the rapid development of web applications, not least jQuery, Bootstrap and D3.js. For managing our software projects we also use open source tools, including Subversion and Trac.

In terms of open source libraries that we have used in our own software: our recent work with the Chernobyl Tissue Bank has involved the development of a research data uploader that depends on the OME Bio-Formats, Apache POI, and Affymetrix Fusion libraries to automatically analyse and annotate the contributed data. MRIdb, our image database for medical imaging, relies on dcm4che, XMedCon and DCMTK. These, and many more of our projects, would be difficult or impossible to implement, at least with the resources we have available, without these tools. We have also received invaluable help from the relevant communities in the process, and have cited or acknowledged them wherever possible.

I regard a significant part of my job as identifying reusable software in order to avoid reinventing the wheel. As software engineers we are frequently asked to come up with efficient solutions for challenging problems relating to increasingly complex and specialised research. This is possible, but often only due to the availability of remarkably high-quality open source software that lets us concentrate on the science itself. In return we should take the time to promote the projects we rely on, appreciate their developers, find ways to give something back and hopefully get credit from our funders and the community at the same time.

For more information about Mark's work, visit his homepage.