By Emma Karoune, Sarah Gibson, Martina Vilas, and Sophia Batchelor, written on behalf of The Turing Way Community.
This guide is part of the Research Software Camp: research accessibility web content series.
We think research reproducibility is super important! Reproducible research is necessary to ensure scientific outputs can be trusted and built upon in future work. An important aspect of reproducible research is computational reproducibility. Binder is a great tool to help you do this easily. Here we offer some top tips so you can make the most out of Binder.
If you have never used Binder before, our workshop is a great opportunity to get you started. You will be able to take some of your own content (in a R or Jupyter Notebook, or scripts that can be run in the terminal) and prepare it so that it can be used and reproduced by others on mybinder.org. You don't need to be experienced with the command line as all of the material is browser-based.
All our resources are open and easily accessible on Github (see our resources for R here, python here and julia here) so you can teach yourself, if you want.
Top tips for making the most out of Binder
1. Learn more about reproducible research.
Making reproducible research 'too easy not to do' is the ultimate aim of The Turing Way. We are passionate about enabling this to happen and our Binder workshop is a great example of the work we are doing to get there.
Our book offers guidance on reproducible research and has many helpful guides to show you how to build a reproducible workflow.
But what is reproducible research?
We define reproducible research as work that can be independently recreated from the same data and the same code that the original team used.
But there are many different types of reproducibility, and even more different ways to define the term reproducible from replicable, robust and generalisable, as it is very dependent on what discipline you are in to how these terms are used, Barba (2018).
Victoria Stodden (2014) has suggested the following distinctions for reproducibility:
- Computational reproducibility: When detailed information is provided about code, software, hardware and implementation details.
- Empirical reproducibility: When detailed information is provided about non-computational empirical scientific experiments and observations. In practice, this is enabled by making the data and details of how it was collected freely available.
- Statistical reproducibility: When detailed information is provided, for example, about the choice of statistical tests, model parameters, and threshold values. This mostly relates to pre-registration of study design to prevent p-value hacking and other manipulations.
If you want to find out more about definitions of reproducibility, then take a look at our section on this topic here.
2. Using Binder for writing articles with research compendia.
To publish reproducible research, you need to communicate the whole research project with the reader. This cannot be done solely through the text of a research article. You need to provide a link to a research compendium in your articles so that the reader has access to your data, code and an explanation of how you have conducted the research.
So a research compendia is a comprehensive set of files that combines all components of a project. The most basic research compendia is a set of folders that provides this information in an open repository.
However, you can also create an executable research compendia that captures all the digital parts of the research project (code, data, text, figures) and all the information on how to obtain the results. The computing environment is described fully to show how to automatically generate the results. Additionally, there is a README file describing what the compendium is about and a LICENSE file with info on how it can be used.
Binder is one way to make an executable research compendia. So learning how to do this can enable you to publish reproducible research and allow others to review, understand, teach and reproduce your research.
If you want to find out more about research compendia, you can check out the section in The Turing Way here.
3. Get started with Github.
For launching Binder, you need to host your repository on Github or another open access online repository such as Zenodo.
If you have not used Git or Github before it can be a bit daunting. Git is the version control system that Github is built around. Using Github does not require any command line experience as it has a fairly friendly web browser interface. All of the Binder workshop, and instructions, use Github to make it easier to set up.
Github does take some getting used to as there are certain processes such as knowing how to set up a repository or adding files and folders that would help you to access the Binder workshop more easily. We suggest that you spend a bit of time getting used to Github before the workshop or before you try our instruction yourself. You can use the 'Getting started with Github' section in The Turing Way.
If you do want more information about version control and Git, then The Turing Way also has sections on those too!
4. Don't forget to make your repository public.
This is the most common mistake that new users of Binder make!
Binder was designed to use public files. There is no way to access files, which are not public from mybinder.org. You should consider all information in your Binder as public, meaning that:
- There should be no passwords, tokens, keys, etc in your GitHub repository.
- You should not type passwords into a Binder running on mybinder.org.
- You should not upload your private SSH key or API token to a running Binder.
The only way to support access to private files, would be to create a local deployment of BinderHub where you can decide the security trade-offs yourselves.
5. You can use any language you want.
Binder is very inclusive by allowing you to use any coding language that you want. This means you don't have to learn any new type of script. You can take existing data and code, and binderize it straight away.
- You just have to define the language of your choice such as Python, R or Julia.
- If a language is not officially supported by a build pack, it can often be installed with a postBuild script. This will run arbitrary bash commands, and can be used to download / install a language.
- It may also be possible to combine multiple languages in a single environment. We recommend that you take a look at the Multi-Language Demo repository for some inspiration.
Binder is a fantastic tool that can help you to reach your reproducible research goal. We are very happy for you to use all our resources to help you learn all about it.