“What do you wish you had known when you started?” This was a question asked during the 2020 Software Sustainability Institute Collaborations Workshop session that caused a (Zoom) room of research software engineers to pause and scratch their heads.
It is perhaps the most valuable question for people entering a new and difficult domain. This applies to any demanding field but especially so in a research software engineering context. Pearls of wisdom from grizzled experts, hints about which development practices and tools to use or avoid, can save many hours and headaches for a freshly minted Research Software Engineer (RSE). However much of this knowledge is not formally documented and is gained the hard way through errors and experience.
Researchers who code and RSEs are particularly vulnerable as they are rarely formally trained in best coding practices. Many are self taught programmers and work in a system that only incentivises gaining the skills to solve the task in front of them. They may not be aware of the resources available to improve their software’s maintainability and reproducibility. Even then, most researchers have to find their own ways of establishing these practices within a workflow and introducing them to a wider workgroup.
Trouble finding resources
During our discussion at the Collaborations Workshop we realised that many of us had learnt a substantial amount of our workflow methods and approaches by chance. Some had mentors who acted as guides and others had happened upon projects or resources we liked.
Not everyone is lucky enough to have a mentor, but there is training material and helpful documentation available (though the quality may vary by domain). However, these are scattered across disparate websites and repositories, and the sheer amount can be daunting - particularly when a researcher is starting out. And some of the informal knowledge such as ‘this package is accurate but slow’ or ‘the community considers this database outdated’ will be missing.
Our solution: a curated list of the things we wish we had known when we started and the resources we wish we had found sooner. Our group came from a variety of research backgrounds so the list is limited to general recommendations. However the best advice and resources are more specific. We wrote this in the hope that it may encourage similar starter packs for newcomers to any RSE-related domain and discipline!
Things we wish we had known earlier:
Write scripts that carry out all the steps to transform raw data into final figures. This makes figures easy to reproduce when new data is generated, allows others to follow the steps leading to a final result and makes it easy for you to understand your own work when you return to it in a few months' time.
When you’re writing code, divide it up into separate files where each file contains meaningful connected code. You can follow an object-oriented paradigm if you want, but you will thank yourself in the future when you don’t have to wade through thousands of lines looking for one function!
Before you start producing large amounts of output from your code, think carefully about how you want to organise it to make processing in the future easy. Think about any ‘metadata’ you want to store alongside the output to make it more useful. This could be anything from random number generator seeds to which version of the code generated the output.