By Alex Coleman, Imre Draskovits, Samantha Finnigan, Sherman Lo, and Callum Rollo.
This blog post is part of our Collaborations Workshop 2023 speed blog series.
Research software security is a critical topic for ensuring researchers are using and developing code that is secure and reusable. When it comes to writing secure code, there are a variety of security challenges that range from managing API credentials to complying with data protection laws and maintaining a university IT support team to upkeep security. In this post, we enumerate a series of common software security challenges researchers face and suggest some tools to tackle them.
It’s unlikely that there’s one single way in which we as software engineers come to realise the importance of managing credentials appropriately, but many of us will have (or will know about) the horror stories which come out of the failure to do so. From leaked research participant email addresses used to send spam emails to members of staff falling for a phishing campaign exposing the organisation to ransomware infiltration.
When we build software, one of the first places we encounter these concerns is in tutorials for frameworks and languages we use, and in that respect, education on security concerns is considerably better than it was even five years ago. It is far less likely than ever for a novice coder to create an SQL injection vulnerability in their web application thanks to frameworks written in a much safer way. However, there are still ‘gotchas’ that we may encounter. One of the more common ones lies in the storage and management of API credentials.
The good news is that the best practices and tools for managing these are readily available. Enabling GitGuardian on GitHub repos (free for teams of 25 members or fewer) can help us take corrective action and revoke that Google Maps API key which sat quietly in a repository for six years, or wipe out that .env file which wasn’t listed in .gitignore using the BFG repo cleaner. Storing our access credentials in environment variables is also encouraged in CI/CD workflows. If the passwords are stored as a string in the configuration file, these tools will let us know and enable us to fix it. Needless to say, once a secret is out there it needs to be revoked: cleaning the git history doesn't affect copies of it that were already cloned.
There’s also significant complexity in configuring web applications correctly. This is where automated testing can step in to check that we got it right. For example, where an application exposes routes which require a logged-in user, we can write tests which check they can’t be hit when logged out! Here's an idea: automated testing doesn’t have to be limited to development processes. It can also check that routes on a production service are behaving as they are supposed to.
Securing our code isn’t a straightforward process, but if we think about it from the design stage, we can begin to map these concerns out in advance. It is important to formalise security in the specification of a research software project, even at the proposal stage. Another crucial thing to think about in advance is designing our application data flows to account for local regulations, including GDPR.
Legal/ethical considerations (GDPR)
Another security consideration for researchers is data management. Researchers may work with sensitive and personal data, such as people’s names and addresses. Regulations such as GDPR lay out the law for recording, storing, using and disposing of sensitive data. For commercial data, a contract may be written to define what can and cannot be done with the data, including restrictions around where the data can be stored.
While many people's exposure to GDPR and data protection may only be through the required training they have to undertake every year, for a software engineer a good working knowledge of the law can be a useful tool when developing applications which use participants' data. Undertaking a Data Protection Impact Assessment or DPIA is often a useful part of the requirements engineering process for developing new software and enables us to plan application data flows and possible points of exposure.
For more restricted data, there are protective measures such as setting up a safe haven, a computer which stores such sensitive data. It cannot connect to the internet and can only be accessed using authorised devices. These are usually maintained and run by local University IT staff who serve as an extra pair of eyes on what comes in and out of the safe haven. This can slow down the development of software but may be necessary to meet the required standards of using the data, avoiding breaking the law and contracts associated with the data.
Unfortunately, reading contracts and laws is a long and exhausting process, especially if one has no background in law. Hiring a lawyer or adviser may be costly, but universities often have data protection or "information governance" departments that are willing and able to help. Finding a named contact in such a department can be invaluable in facilitating research development and software requirements analysis.
University IT departments will be closely involved in implementing and maintaining facilities such as safe havens and the devices researchers use. However, high-profile data breaches such as the UEA Climate Research Group controversy in 2009 and the Newcastle University ransomware attack in 2020 have encouraged universities to adopt strong security policies, which can be incompatible with rapid experimentation by researchers.
Local IT Department constraints
The constraints imposed by University IT departments in the name of security can be a challenge to researchers. University IT departments have to manage and maintain different sets of applications, systems and users and have to make security considerations across the entire estate, which can create friction between researchers and IT.
Much of the time, this end-user experience is negative. We find restrictions on our equipment, lack of administrative permissions, and limited sets of permitted applications. In the long run, these restrictions are present to protect us and other members of the University. Restricting administrative privileges can seem unreasonable for software developers, but managing access to administrative credentials is crucial to prevent serious security vulnerabilities.
The requirements to find, download and execute third-party code, be that the latest version of Firefox or a dusty old FORTRAN library, often clash with the need for the IT department to lock down the network and minimise the attack surface of an institution. Researchers can find it difficult to understand what appear to be overly restrictive and needlessly tight policies. Why should you need to wait a month to test out a program that may not even solve your current blocker?
A significant danger with these long waits is that researchers with a little technical know-how will find ways around these blockers. Abusing allow-listed MAC addresses, port forwarding through SSH to traverse a corporate firewall, rooting PCs to install Linux, or simply working on a personal device. In many cases, these workarounds introduce more security risks than if the IT department could respond quickly and support researchers’ requests.
The issue of local University IT security policies slowing or inhibiting computational research is of crucial concern when it comes to ensuring researchers adopt good software security standards. We recognise that solving this problem is complicated and requires a better dialogue between University IT administrators and researchers so that local IT policies can better match the IT needs of researchers. Under-resourcing in IT departments is also a significant problem, and strong investment in these teams at an institutional level is crucial.
Researchers also need to develop their understanding of the issues around things like administrative privileges. There are often alternative installation methods for software that don’t require administrative privileges and are often possible without IT intervention. This can also make your workflow more reproducible because you don’t need administrative rights. Starting to have these conversations between University IT administrators and researchers is crucial in improving this key security challenge.
Adding security to software increases the friction users experience when using or even developing software. This is inescapable at present but is the price worth paying to ensure our software is secure and keep our data and infrastructure safe. Accepting this trade-off and working with it rather than against it is crucial in ensuring research software is both secure and reproducible, and showcases that we are responsible developers. Many tools help with implementing security principles within software. Wherever possible, we should try to not reinvent the wheel but take advantage of existing best practice advice. Finally, software security is critical for the future of research software. Recognising and responding to the challenges outlined above is the start to embedding a security-conscious approach to research software development.