Why should you care about reproducible code — and how to get started?

Author(s)

Diana I. Bocancea

Daniela Gawehns

Julian Lopez Gordillo

Sam Langton

Katinka Rus

Sally Hogenboom

Iris Spruit

Eduard Klapwijk

Posted on 5 November 2024

Estimated read time: 7 min

Sections in this article

Why should you care about reproducible code — and how to get started?

This blog was originally published on the Netherlands eScience Center Medium page.

On 23 April 2024, the first ‘National Research Software Day’ took place in Hilversum, the Netherlands. During the unconference part of the program, Diana Bocancea ran the session about the importance of reproducible code.

Despite the increased awareness regarding reproducibility in recent years, most research results are not computationally reproducible: they cannot be independently reproduced. The main reason for this is that in most cases, data and code are not shared publicly. But even when a researcher openly shares their data and code with the public, reviewers or research colleagues, their findings can rarely be reproduced in their entirety. Perhaps the code cannot be executed, only parts of the results are generated, or perhaps the results produced are totally different from the published study. Reproducibility can even be a challenge internally. As any programmer will know, just because your code runs perfectly today, it does not mean it will run perfectly in five years’ time (or even five days’ time!).

But why does writing reproducible code even matter, and how might you as a researcher get started on this journey toward reproducible research?

Benefits of working reproducibly

One reason is that it will make your life as a researcher easier! Many of the components that make a piece of research reproducible — well-documented, clearly written code, containerised environments, properly organized data — are also things that save a lot of time. These activities ensure that when you return to code six months later, the scripts still run, and you don’t have to spend three days debugging them. It also means that your code can be shared and reused by your colleagues, saving them time, and giving you credit (e.g., authorship) in the process. There are other reasons too, including reputational benefits and advantages during peer review. You can read more about ‘selfish’ reasons to make your research reproducible here.

What about the scientific community? We are currently in a situation where a large proportion of research is not reproducible. This situation threatens the integrity of scientific research, weakens our evidence base, and ultimately might lessen public trust in science. The main method for scrutinising and sharing scientific results — peer-reviewed journals — are slowly adapting to this realisation. Increasingly, researchers are encouraged, if not expected, to provide their data, code and other materials used alongside the publication itself. In time, we could see reproducibility move from being an optional bonus to becoming a mandatory part of the research (and publication) process. Adapting to this change early will bring you all the benefits noted above (e.g., timesaving, code reuse) but will also prepare you for the future.

Reproducible tools as a contribution to science

On that note, the changing perspective on the importance of reproducibility is bringing new career paths with it. For example, beyond the fundamental tools that enable reproducible research (such as git for version control), other higher-level tools are appearing to address certain challenges particular to some scientific domains. Usually, they are aimed at solving well-known problems for researchers from a certain field, problems not well-known outside of one niche. They might revolve around workflow management and experiment design or standardisation of certain procedures within the community. In many cases, the developers behind those software tools and resources are… the researchers themselves. They might have struggled with these issues in their own research and decided to take up the task of developing the tools that they wished they had (for example, extensive Python-based processing pipelines such as fmriprep in the neuroimaging field, and thousands of R packages ranging from complex statistical modelling packages such as brms for Bayesian regression to packages to help you formatting manuscripts such as papaja). In doing so, they shifted their focus from their original subject domain to the mission of making research within that domain reproducible. This typically takes the form of developing the software libraries that make that possible and integrating them with the standard software used within the domain.

The whole scientific community can benefit from such tools! Newer research can be built on top of them, without the need to solve common reproducibility issues from scratch. These software developments can be just as valuable a contribution to the research domain as other research findings, and as such, they should be recognised accordingly. And just like it is possible to publish your research findings, it should be possible to publish your code contributions when they are significant enough. A good example of this idea put into practice is the Journal of Open Source Sofware (JOSS), where the submitted code takes the main stage in the review process (as opposed to be required as “supplementary material”). Initiatives like JOSS showcase developments around reproducible research as a meaningful contribution to science and a viable development path, both of which are powerful incentives for researchers to get interested in the topic.

In time, with all these smaller and bigger changes, scientific research can become more trustworthy, more reliable, and in turn, more impactful.

How to get started

The inevitable question that follows is then: how to get started with reproducibility? One answer is training. Luckily, there are a lot of initiatives for training that will help you to get started, both nationally and internationally. For example, a lot of institutions organize Software and Data Carpentries that offer foundational coding and data science skills.

One way senior academics can make a difference — as group leaders, supervisors, and grant reviewers — is to give (junior) colleagues the time and incentives to value and practice reproducibility. For instance, supervisors could have all PhD students replicate and extend an existing analysis as part of their initial research. The process of reproducing an existing work will familiarize the student with the common challenges that come with doing good science. The work of reproducing someone else’s work might entail finding and understanding a certain dataset (sometimes difficult to even get access to), as well as the software (e.g. scripts or packages) that was used to produce the results. Running the previous analysis, often on a different computer and at a later time (when software dependencies have likely changed) would check the computational reproducibility of the previous work, and in doing so, be a valuable learning experience for the student.

Group leaders benefit from reproducible workflows as it prevents (PhD) students from re-writing the same piece of software again and again. While learning the ropes is important for any junior scholar, it is not very efficient if every new generation of students re-writes code for basic operations or frequently used analysis methods.

In addition to the benefits of an academic career, researchers themselves also increase their employability outside of academia by learning digital skills (such as version control or programming reusable pieces of code) that are valued in many different (industry) jobs.

In modern science, computational methods are the norm in almost every discipline. Yet attempts at reproducibility are almost always unsuccessful due to missing materials and/or lack of skills. Part of this problem can be mitigated by learning how to produce reproducible code: how to write documentation, perform version control, and manage packages. Doing so will benefit you as a researcher, but also your colleagues, and the wider scientific community, because your (coding) efforts will become reusable. Increasing the use of reproducible workflows is in the interest of many stakeholders in academia — increasing the reproducibility of research is key for a broader change in how we do science.

Authors

Katinka Rus

Sally Hogenboom

Open Universiteit

Iris Spruit

Universiteit Leiden

Eduard Klapwijk

Erasmus School of Social and Behavioural Sciences

Image by The Turing Way.

Home Training hub

The Good Research Code Handbook

Bookmark this page Bookmarked

The Good Research Code Handbook

Estimated read time: 1 min

Sections in this article

The Good Research Code Handbook

This handbook is for grad students, postdocs and PIs who do a lot of programming as part of their research. It will teach you, in a practical manner, how to organise your code so that it is easy to understand and works reliably.

Go to handbook

Home Training hub

Docker Introduction

Bookmark this page Bookmarked

Docker Introduction

Estimated read time: 1 min

Sections in this article

Docker Introduction

This tutorial aims to introduce the use of Docker containers with the goal of using them to effect reproducible computational environments. Such environments are useful for ensuring reproducible research outputs, for example.

Go to tutorial

Home Training hub

Introduction to BinderHub

Bookmark this page Bookmarked

Introduction to BinderHub

Estimated read time: 1 min

Sections in this article

Introduction to BinderHub

BinderHub is a cloud-based technology that can launch a repository of code (from GitHub, GitLab, and others) in a browser window such that the code can be executed and interacted with. A unique URL is generated allowing the interactive code to be easily shared. The purpose of these Binder instances is to promote reproducibility in research projects. This Handbook provides information on how to use and create a BinderHub instance.

Go to handbook

Home Training hub

R, Open Research, and Reproducibility

Bookmark this page Bookmarked

R, Open Research, and Reproducibility

Author(s)

Andrew Stewart

SSI fellow

Estimated read time: 1 min

Sections in this article

R, Open Research, and Reproducibility

This series of workshops covers open research and reproducibility in R. It consists of 12 workshops aimed at beginners.

Go to series

Home Training hub

Framework for Open and Reproducible Research Training (FORRT)

Bookmark this page Bookmarked

Framework for Open and Reproducible Research Training (FORRT)

Estimated read time: 1 min

Sections in this article

Framework for Open and Reproducible Research Training (FORRT)

FORRT provides training on open and reproducible research. Their learning database tool has over 900 resource contributions.

Go to their website

Go to the curated resource database

Home Training hub

Ten reproducible research things

Bookmark this page Bookmarked

Ten reproducible research things

Estimated read time: 1 min

Sections in this article

Ten reproducible research things

This self-paced tutorial outlines ten steps to make your research reproducible, beginning with data quality, documentation, management of sensitive data, through to the publication of datasets. The aim is to move towards reproducible research regardless of your current skills, and includes useful resources for beginners, intermediate and advanced learners.

Go to tutorial

Home Training hub

Openness and Reproducibility Research Practices Training

Bookmark this page Bookmarked

Openness and Reproducibility Research Practices Training

Estimated read time: 1 min

Sections in this article

Openness and Reproducibility Research Practices Training

The Centre for Open Science offers four modules around openness and reproducibility research practices. These include:

Hands-on data management kickstarter
Reproducible methods: how to find them & write them
How to organizse painless research collaborations
Research sharing kickstarter

Go to modules

Home Training hub

Code Reproducibility Training

Bookmark this page Bookmarked

Code Reproducibility Training

Estimated read time: 1 min

Sections in this article

Code Reproducibility Training

This training programme is being developed as part of the ELIXIR-CONVERGE project. It is currently led by Alexia Cardona (ELIXIR-UK) and Nazeefa Fatima (ELIXIR-Norway).

They aim to create an extensive reproducibility training programme with the aim to equip learners with the core skills required to develop sustainable and reproducible code. The training materials will be developed for people with non-computational backgrounds.

Find out more and sign up for training courses

Home Events and workshops

Docker Containers for Reproducible Research Workshop (C4RR)

Bookmark this page Bookmarked

Docker Containers for Reproducible Research Workshop (C4RR)

Sections in this article

Events details

Location: Cambridge

Dates:

27 | 28 June 2017

Docker Containers for Reproducible Research Workshop (C4RR)

Container ship. C4RR post workshop blog post is now available

Twitter: #C4RR

The Software Sustainability Institute’s Docker Containers for Reproducible Research Workshop brought together researchers, developers and educators to explore best practices when using containers, not only Docker, and the future of research software with containers. Docker Containers for Reproducible Research Workshop (C4RR) toke place from 27th to 28th June 2017 at Cambridge.

Who attended

See who attended C4RR.

Venue

Baker Building, Department of Engineering
Trumpington Street
University of Cambridge

Maps and more information are available here.

Agenda

Take a look at what happened at C4RR.

Containers

Containers, specially Docker and Singularity, is the hottest topics at the moment for reproducible research. What impact does the use of containers have on research, how can researchers benefit from them and make their research more reproducible? The Software Sustainability Institute invites all members of the research software community to explore and discuss these and other questions at C4RR.

Containers refers to a server virtualisation method that is lighter than virtual machine what allows a quicker launch time for applications and more concurrent instances running on the same server. Researchers currently use Docker, one of the container implementations available on the market, to package the software used in their research so other, including their future self, can reproduce the computational environment used in an experiment. Another scenario where researchers are using Docker is when some software libraries aren't available on all operating system so users can rely on containers to launch the software in less time than it makes to make a cup of coffee! A third scenario is when researchers do high performance computing on clusters. Each machine of the cluster needs to have the software installed and as demonstrated by some researchers and HPC research software engineers the use of containers, such as Singularity, is a great time saver.

Workshop

C4RR attendees gained insight into container technologies and how it impacts research. C4RR was a great place to network meet people from varied research domains but all with an interest in container technologies.

Some aspects of containers covered in the workshop are:

Containers use-cases in research
How could containers be relevant to and have an impact on your research?
Benefits of containers
Best practices on data management when using containers for data analysis.

Code of Conduct

Take a look at our Code of Conduct.

Related events

Containers for HPC, a Workshop on Singularity and Containers in HPC and Cloud. Cambridge, 29th and 30th June, 2017.

Further information

Open Container Initiative

Subscribe to Reproducible research

Why should you care about reproducible code — and how to get started?

Why should you care about reproducible code — and how to get started?

Benefits of working reproducibly

Reproducible tools as a contribution to science

How to get started

Authors

Diana I. Bocancea

Daniela Gawehns

Julian Lopez Gordillo

Sam Langton

Katinka Rus

Sally Hogenboom

Iris Spruit

Eduard Klapwijk

The Good Research Code Handbook

The Good Research Code Handbook

Docker Introduction

Docker Introduction

Introduction to BinderHub

Introduction to BinderHub

R, Open Research, and Reproducibility

R, Open Research, and Reproducibility

Framework for Open and Reproducible Research Training (FORRT)

Framework for Open and Reproducible Research Training (FORRT)

Ten reproducible research things

Ten reproducible research things

Openness and Reproducibility Research Practices Training

Openness and Reproducibility Research Practices Training

Code Reproducibility Training

Code Reproducibility Training

Docker Containers for Reproducible Research Workshop (C4RR)

Docker Containers for Reproducible Research Workshop (C4RR)

Who attended

Venue

Sponsors

Agenda

Containers

Workshop

Code of Conduct

Related events

Further information