Times have changed. Should we always reuse code?

Posted by r.silva on 5 May 2016 - 10:48am

By Melody Sandells, Research Fellow, Environmental Systems Science Centre, University of Reading.

This is the second in a series of articles by the Institute's Fellows, each covering an area of interest that relates directly both to their own work and the wider issue of software's role in research.

My research is about the physics of snow, and how to retrieve snow information from satellite data. Things have moved on a lot since I was an undergraduate where the concept of email was new to the masses, the introduction of the web blew our minds and Windows 3.11 on a 486 was amazing. In the old days, you would read a paper and if it was of sufficient interest, you would go away and code it for your own purposes - in my case it was state-of-the-art Fortran90. Things aren't like that anymore - why recode stuff that has already been done previously for the paper? Yet, for me, the urge to recode just won't go away.

My first foray into snow modelling involved some inherited code, a snow model based on physics called SNTHERM, and developed by the U.S. Cold Regions Research Engineering Laboratory. This is a multilayer model on an adaptive grid that squashes to represent the thinning layers of naturally compacting snow. SNTHERM is a multi-phase mixture model, with mass conservation of the individual ice, liquid water and vapour phases within the snow matrix of each layer. Solution of the energy balance allows simulation of the thermal structure of the snow, which can agree remarkably well with observations: the temperature at various heights in the snow and even the size of the snow crystals. Originally the model was developed to predict temperature changes in the surface of the snow from tank tracks, with a military remote sensing application. More widespread is its use for water management and satellite monitoring of snow mass.

SNTHERM was the first large program that I had experienced, with around 7000 lines of code. It was and still is available to download via ftp in its 1989 release. From memory, the User's Guide and the Documentation were enough to get started, but my (now) husband helpfully suggested that if I really wanted to get to know the model I should recode it. It was, after all, in that nasty Fortran 77. It took one month to recode from Fortran 77 to Fortran 90 and two years to debug both the new model, and the original, but I now know the model extremely well.

The focus of my PhD was on the effects of vegetation on the snow cover: does snow in a forest melt before snow in the open? [Answer: sometimes]. To obtain this ground-breaking answer I coupled a radiative transfer model of the forest cover with the recoded SNTHERM model. Despite being in the same department as the lead author of the original canopy model paper, the expectation was that I would code the vegetation model myself based on the paper, which I did. I remember thinking it was a bit of an odd way to do things at the time but I just got on with it. Did I compare with the original code? No. Was it suggested? No.

Fast forward a few years and I used SNOWCAN again as part of a large snow model intercomparison study called SNOWMIP2. This study examined the outputs of 33 snow models with differing complexity in model structure and assumptions in the physics, all forced with the same driving data. The main sticking point for SNOWCAN was that the study required the proportion of snow precipitation vs rain precipitation to be specified explicitly, and the rain / snow split was inconsistent with the physics of the model at times. I think that if I had not invested so much time in recoding at an earlier stage, it would have been extremely difficult to identify the source of the problem.

Another 3 years later and I was asked for a copy of the model by the first person who wanted to run the model with their own data! However, having made several changes to the model to accommodate the SNOWMIP2 requirements as well as other half-completed studies, I found it extremely difficult to identify which version of the model I had used. Oh how I wish I'd stored the publication version of the model somewhere, clearly identified, with all the necessary ancillary data. Moving to the present day, things are different now, and I like to think I’ve learnt something. I have sought out projects that allow me to develop the skills that I craved, as academia is too time-pressured to learn them otherwise. It’s a whole new world of python, unit tests, git / mercurial and community models. It’s a better place, though perhaps a spot of recoding isn’t always bad.