Research Software Security Snippets #1

Posted by d.barclay on 7 February 2023 - 9:00am

Research Software Security Snippets written on top of a screen with coding on it

Original photo by Florian Olivo edited by Denis Barclay

By SSI Fellow Alex Coleman.

Hello to all the readers of the SSI blog! I’m Alex Coleman, a new SSI Fellow for 2023. I plan to use my fellowship to explore topics around software security with a focus on research and develop these into training materials to help researchers write secure sustainable software. I do not have a background in software security. Before becoming an RSE I was a researcher and learned to code from what I like to think of as the researcher's perspective (lots of Stack Overflow!). I plan to learn a lot over the course of the fellowship and share as much of this as possible. So welcome to Research Software Security Snippets #1!

Between munching mince pies and getting through mountains of leftovers during the Christmas break, I read about the rather nasty vulnerability that appeared in PyTorch during late December. This occurred within the nightly builds of PyTorch for Linux installed via pip between December 25th and 30th (stable versions of PyTorch were not affected). In this particular instance, the pip installing the nightly build installed a dependency torchtriton that had been compromised. This sort of vulnerability is called a dependency confusion attack, a form of supply chain attack that consists of malicious actors targeting software dependencies to exploit the end user.

Let's dig in a bit more into how this vulnerability occurred. Anyone familiar with Python will have met the Python package manager pip. pip is a tool that allows us to install Python packages from the Python Package Index (PyPI) or other indexes that store Python packages. PyPI is the main Python package repository (with 13.5 TB of release files!) and is where packages are installed when we run pip install package. PyPI also allows anyone to upload their Python package to PyPI, making it available for others to install and use. This is a great way to share our packages in a consistent manner, but it also opens the door to malicious actors who exploit the openness of PyPI. This vulnerability in PyTorch is a good example of this.

To install nightly PyTorch via pip you’d use the command available from their website:

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu116

This command specifies to install the pre-releases (--pre) and to use an extra package index (--extra-index-url). This means that pip will attempt to install the pre-releases of these packages and, crucially, if the package isn’t present on PyPI, to install it from the alternate PyTorch package index. This behaviour of using the private index only if the package is not on PyPI was exploited in the PyTorch vulnerability. Their nightly install required the torchtriton package that existed in the PyTorch index but not within PyPI. This allowed someone to upload a malicious package called torchtriton to PyPI, which would be installed in preference to the PyTorch index version (hence the name dependency confusion attack). This malicious package contained a binary that captured various pieces of system information, including SSH keys, and uploaded them to a remote server.

Thankfully, this issue was identified, and the PyTorch team have taken steps to mitigate the effects, including notifying the PyPI security team to remove the malicious package. Although it has reignited discussions about the behaviour of --extra-index-url, changing this is not a straightforward fix. However, this shouldn’t be a cause for despair because there are ways to reduce the likelihood of these kinds of attacks. It should act as a reminder that software security is important, especially when developing open, sustainable software. It is crucial to ensure the ways we distribute our software are secure. For those of us developing research software, this all sounds very sensible. However, I imagine you may be thinking: "But that’s one more thing I have to learn alongside writing tests, documentation, and implementing that fancy design pattern and my novel algorithm!"

This is an absolutely valid concern! That's why developing accessible, concise and meaningful materials that can support researchers in writing secure code is a core consideration of my project.

P.S. If you’re interested in software security or have something you think I should include in my planned materials get in touch via email at a.coleman1@leeds.ac.uk