Is software a method?

Posted by s.hettrick on 31 March 2015 - 9:43am

By Philip Fowler, Software Sustainability Institute Fellow and postdoctoral researcher at the Department of Biochemistry at the University of Oxford.

Last month I attended the 59th annual meeting of the US Biophysical Society. It's the sixth time I've been but the first time I've gone really thinking about how our community treats software under my remit as a Software Sustainability Institute Fellow. I was also chosen as a guest blogger and you can see my posts on my blog or on the Biophysical Society blog.

Wait... what is biophysics?

It is the application of physical methods (and to a lesser extent, theories) to biology, focusing mainly at the molecular level. This includes determining the structure of proteins (and more famously, DNA) by illuminating a protein crystal with X-rays, recording the diffraction pattern and then inferring the complex 3D structure that would be responsible for that pattern. It is a mature discipline: the first conference of the society was in the 1950s and now it attracts over 6,000 scientists every year.

How do biophysicists use software?

Pure theoreticians aside, I struggle to think of an experimental biophysical technique that doesn't need software to either analyse or interpret the data: NMR, electrophysiology, electron microscopy, X-ray crystallography, mass spectrometry... Even seeing is believing approaches like light microscopy (now usually coupled with fluorescent labels) often need image processing techniques and back-end databases. Much of the software experimentalists use is commercial and is either embedded in the instrument or comes as a black-box that runs on a Windows machine on the adjacent bench. And, yes, in our department we have had occasions when a lab's Windows 95 PC (!) finally died and this was the only machine that ran the software that controlled their instrument...

In my field, computational biophysics, we run large molecular dynamics simulations of one or many proteins, with either all atoms represented or with some degrees of freedom removed. Hence software is our instrument; there is no avoiding it. Simulations can be huge and immense efforts have been directly at optimising codes and parallelising them so they run efficiently on hundreds or thousands of CPUs (and now CPU/GPUs). The code I use, GROMACS, is open source (LGPL v2.1), is now over 2 million lines of C/C++ code (see their GitHub page), is extensively developed and supported and is usually installed as standard on academic high performance computers. Other codes include NAMD, DESMOND,LAMMPS, AMBER and CHARMM. The last two began life in the 1980s and as a result are mainly written in Fortran, but thanks to partial rewrites and continuous development, are still going strong.

Despite this indirect or direct reliance on software there were almost no talks or posters on improving software, say how to use GitHub, or the importance of (any type of) testing. The focus was very much on the science and to a lesser extent, the methods. The interesting thing is that methods development, even in my field, means coming up with new algorithms or approaches. So you might see a talk on a new method for using simulation to calculate a free energy but you wouldn't see a talk on how code X has been ported to CUDA allowing it to run on commodity NVIDIA GPUs and so speeding up its performance by 10 times. This results in some odd situations: a new algorithm might be much more efficient than the existing approach, but because it has only been implemented in toy code it is in practice less efficient than using the theoretically less elegant, older approaches that are already present in the optimised big community codes. This is just reflecting the fact that scientists are rewarded for publishing papers and it is easier to publish a paper saying "here is a new way to do X" rather than "here is how I implemented this method in code Y". To put it another way, the fact that the new method is often not implemented in a community code is no barrier to publication. But then again, surely most improvements eventually make into code? I suspect not.

Despite this, I met a large number of people at the meeting who are open-source advocates and have all their lab code in GitHub, or who are Software Carpentry Instructors. It is just that the meeting doesn't focus on these topics and so Twitter takes up the slack.

Which is a shame. But also, I suspect, an opportunity.