By Swithun Crowe, University of St Andrews, Mike Jackson, Software Sustainability Institute, Karen Porter, University of Oxford, Neil Chue Hong, Software Sustainability Institute, Mayeul d' Avezac, UCL, and Oliver Laslett, University of Southampton.
A speed blog from the Collaborations Workshop 2016 (CW16).
The communities affected by a piece of software are many and varied. There will be some who care about each bowel movement in the code, and others who really just want to know what needs to be put in and what comes out. This blog post describes four different communities from the cognoscenti to the would-be users, and the tools that make it easy for the community to emerge and work together.
It’s all my own work. Within this circle, the code is an esoteric piece of work meant for an in-group of dedicated wizards. Their challenge is how to share the code between each other and their future selves. So first they adopt version control to make sure that the code and associated data is accessible at the drop of a pointy hat. There are many options available for hosting source code e.g. University repositories or data centres, GitHub, GitLab, BitBucket, LaunchPad or Assembla. Secondly, they decide on a set of coding standards so they can understand each other's’ code, and their own code six months from now, without the need for ESP or time travel! These may be based on those for the languages they use (e.g. Oracle’s “Code Conventions for the Java Programming Language or PEP 0008 – Style Guide for Python Code), or inspired by those of other projects and organisations (e.g. Google’s Java style, GNU’s Coding standards, or the UK Met Office’s Fortran 90 Standards). They may use style checkers and code formatters to help write code that conforms to these standards (e.g. PyLint for Python, ClangFormat for C/C++ or CheckStyle for Java). And, they use automation to delegate all the mundane software development activities to the computer, by setting up an automated build system (e.g. Make, ANT, Maven, CMake, Python setuptools, or R package tools), a test framework (e.g. JUnit for Java, CUnit for C, CPPUnit and googletest for C++, FRUIT for Fortran, py.test and nosetests for Python, testthat for R and PHPUnit for PHP) and a documentation generation system (e.g. Doxygen for C, C++, Fortran or Python; Docstrings and Sphinx for Python, JavaDoc for Java). Automation allows them to free their time and minds to do the magic, the research!
Now I need to share with a collaborator. Once you have a crack-squad of developers working on your project, you will want to begin advertising its capabilities and sharing your code with other researchers in your domain. This community might be summarised as the reluctant programmers - researchers who have some programming experience, but may be self-taught and have limited knowledge of software development.
Letting the tech community loose on it. Packaging code with dependencies and detailed documentation are essential. The fewer things a user has to do before they can run your software, the more likely they are to run it! Depending on the software language or environment, tools such as Ivy and Maven for Java, Python pip and setuptools, PHP composer, Ruby gems, R PackRat, CPAN-related soon for the Perl CPAN archive, can help you with your dependencies. Alternatively, provide your code and dependencies using one of the virtual machine solutions (Docker, Vagrant, Oracle VirtualBox, VMware) which might be suitable ways to package for these people. Complementing these with forums or issue tracking for when things go wrong (and they will!), will provide reassurance to your users that their cries for help will be heard!
Sharing a link and letting it run. The gold standard for your research software is for it to be intuitive for non-technical users, researchers who don’t know anything about programming. The software should be easy to get started with. Wrapping functionality in a familiar environment, such as a point-and-click GUI or a web interface to a local or cloud service, is invaluable.
All users should be given some documentation, not to describe the design of the program (though that may be of interest), but to showcase features and functionality. Saying how your code is designed, what it does and how it works is not the same as explaining how to use it. Step-by-step examples of how to install (if relevant) and use your software, along with example inputs and expected outputs, are as useful for reproducibility and testing as they are for educating new users in how they can use your software in their research!
Your documentation should leave no room for ambiguity, for within ambiguity, time-wasting monsters lie: you know whether Python means Python 2 or Python 3 or whether Linux means Ubuntu or Scientific Linux 7 but not including these (trivial?) details can make the difference between a happy new user, exploiting your research, and a disgruntled ex-user, badmouthing it!