By Thomas Robitaille, Software Sustainability Institute Fellow
I recently attended the European Week of Astronomy and Space Science (EWASS), the largest yearly conference for European astronomy. This year, it featured a session on Developments and Practices in Astronomy Research Software that touched on many themes important to software sustainability.
With the rise of open source projects in Astronomy, this session was a great way to expose astronomers at a major conference to best practices in software development and update them on available free software and projects, including Astropy, TARDIS, Stingray and more. Talks included topics such as reproducible science, software best practices, when to make code public, transparency, credit, and citation of software, as well as a number of examples of best practices and lessons learned in specific projects. The session was extremely successful, with many talks ending up being standing-room only. On the second day, a hack day allowed attendees to work on related projects and was also very successful. Alice Allen (co-organiser of the session and maintainer of the Astrophysics Source Code Library) has written a series of blog posts on the session (see ASCL at the 2017 EWASS meeting, Developments and Practices in Astronomy Research Software at EWASS 2017, ASCL projects at the EWASS Hack Together Day, In conclusion 1, and In conclusion 2) so rather than repeat this, in this blog post, I will highlight some of the development practices we follow in the Astropy project, which I presented in my talk at this session.
One of the most common recommendations that comes up when talking about best practices for software development is version control, which helps track changes to source code files over time. Scientists who don’t publish code often dismiss this as a ‘clichéed’ best practice that is actually unnecessary for their own workflow, but in fact everyone can benefit from using version control, and it is well worth the small investment of time to learn. I often think of not using version control as essentially the same as climbing without a harness — possible but extremely risky! For Astropy, we use Git and the GitHub platform (though there are other options out there), and you can find all our repositories in the GitHub Astropy organization. For collaborative projects such as this one, Git is not only useful for tracking changes, but this allows us to do more advanced analysis of the repository, such as finding out who the last person to modify a line was (using ‘git blame’), finding the difference between two arbitrary commits/points in time (using ‘git diff’), or figuring out which changes caused something to no longer work properly (using ‘git bisect’).
Writing good code
Another important practice in projects such as Astropy is aiming to write ‘good’ code. But what does ‘good’ mean exactly? Should it be commented on? Use functions and classes wherever possible? Follow style guidelines? (e.g. PEP8 for Python) Fundamentally, I don’t think these are what we should strive for – instead, ‘good’ code is simply code that others can read, understand, modify, and reuse (and things like style guidelines and functions/classes are part of doing this). This is especially important because, in this kind of project, it is not uncommon for core contributors to pass on their responsibilities to other developers. Thus, in projects such as Astropy, we rely extensively on code reviews to make sure that, in addition to the developer who proposed a change, at least one or two other developers understand the changes. GitHub provides a great and intuitive interface to do this—pull requests—which show the changes and allow developers to have a discussion about the changes, either in a general sense or on specific lines in the changes—here is an example of a pull request in the Astropy project.
Writing tests is a common practice in open source projects. Testing can take various forms, and fundamentally the idea is to run small chunks of code that ensure that part of the package being tested works as expected. Scientists tend to think that tests are not applicable to their software since they don’t always know what results to expect, but this is a misconception – it is virtually always possible to test sub-parts of software packages, and if the result of some parts are not known in advance, checking that it runs, regardless of the result, is already a big improvement over not having any tests (and then subsequently checking that the result stays the same). In Astropy we make use of the pytest framework, which makes it straightforward to write tests—all that is needed is to write small Python functions and pytest will make sure that the functions run without errors.
Just as important as functionality being added to a package is the user-facing documentation about this functionality. In the Astropy project, we typically require documentation to be added at the same time as the functionality (no procrastinating!). Documentation should not simply be a list of functions/classes, but provide background, installation instructions, a good narrative, examples, and finally a list of functions/classes. We make use of Sphinx to build our documentation—this can be used for Python and non-Python projects, and the idea is to write the documentation in reStructuredText, which (like other so-called ‘markup languages’) means that you simply write plain text with some special syntax for e.g. headings and links (see an example here), and this can then get rendered e.g. to HTML, but also LaTeX and other formats. Sphinx is a package that will do this conversion and produce for example a full static website. We then use a service called ReadTheDocs which takes a GitHub repository and will run Sphinx on the documentation in that repository, and host the documentation. Best of all, the online documentation gets rebuilt every time a change is made in the GitHub repository.
Continuous integration is the concept of running some actions, such as testing, every time there is a change or proposed change to the repository. This is often mentioned in conjunction with testing, but in fact continuous integration can have a broader scope than just running tests and includes in the case of Astropy building the documentation to make sure no warnings are emitted, or running style checks on the code. There are a number of great (and free for open source projects) services that can be used for continuous integration—in the case of Astropy, we use Travis CI to run builds on MacOS X and Linux and on different versions of Python. We also use AppVeyor to run tests on Windows, and finally CircleCI to run 32-bit Linux builds. All three services can run when contributors open pull requests, and GitHub will nicely display the status of each of these builds, ensuring that contributions do not break anything.
These are just some of the best practices that we follow in Astropy, but feel free to leave a comment if there are other ‘best practices’ that your project(s) follow that you think would be interesting to others!