An institutional perspective on publishing open code

Posted by s.aragon on 5 June 2019 - 9:00am
open lock with binary code
Image by Kiroe.

By Thomas Etherington, Spatial Modeller - Ecology, Manaaki Whenua - Landcare Research, and Institute Fellow.

By openly publishing their code, scientists make their science more reproducible – which is a very good thing!  Recently, a group of staff where I work at Manaaki Whenua - Landcare Research wanted to establish an institutional GitHub account so that we would have a place to publish the code we were generating. However, while there are many institutional benefits for encouraging the publication of scientific code, there are also institutional considerations around things such as intellectual property and risk. Therefore, taking the perspective of a research institution, we tried to understand what were the benefits and obstacles to open code publishing by asking ourselves: who will be involved, how should code be licensed, where should code be published, how to get credit, what standards, and what costs? We have just published our first thoughts in PeerJ Preprints, and I would hope they might be useful to, or can be commented on, by other members of the Software Sustainability Institute.

Our first impression was that while there was a lot of information out there about code publishing, we could not find a summary that presented the current thoughts and practices with specific reference to the needs of a science institution rather than an individual scientist. This may not necessarily be a problem for academics at universities who have a certain degree of autonomy and freedom in how they practise their science, but for scientists within other types of institutions there may be a desire or requirement for a more standardised or controlled publishing framework.

To begin with, in addition to the scientist-programmer who wrote the code, various other people will most likely need to be involved with publishing code. Various managerial and support service staff will likely be needed to ensure that issues around intellectual property are appropriately managed. Licensing would represent another immediate consideration, and while we would advocate open and permissive licensing, we can recognise that there may be occasions when that might not be possible due to requirements from project funding.

Another decision that was also not entirely clear to us was where to publish code. Clearly fully developed software would ideally be maintained and developed in a version control hosting service such as GitHub. However, we also felt there would be scientific code that represents a static record of an analytical process and as such would not change, and this might better be published alongside associated data in an archiving service such as Zenodo. We could not clearly specify when either approach would be more suitable, though we did identify characteristics of code that might be indicative of which option is more appropriate.

There has been a lot of discussion published about how to gain credit for software by making code citable, so this aspect was quite straightforward. A far more challenging consideration was what code standards to expect. Again, this seemed to us to be highly dependant on the purpose of individual code, so we struggled to define some minimum standards that would be universally relevant and encourage good code to be published, while not making the standard so high as to put people off from engaging in code publication.  We also recognised that there are costs to publishing good code, both in terms of time requirements and financial costs – especially for those scientists outside of academia who cannot necessarily access free educational accounts for version control hosting services.

We then attempted to pull together all these various considerations into a framework that maps out a workflow and represents a summary of our own experiences and those of others that we could find. However, I am conscious that there could well be a wealth of information and knowledge about institutional code publishing that might not have been openly published – or that we couldn’t find!  So I would be very keen to invite other interested folks to read and comment on our preprint as we try to refine our institutional framework to encourage staff to openly publish their scientific code. For those that are interested, the preprint can be found at PeerJ where anyone can post questions, feedback, or links.