Software and research: the Institute's Blog

Coding RetreatBy Dr Eilis Hannon, University of Exeter, Software Sustainability Fellow.

On Friday 30th June, the first Coding Retreat was trialled at the University of Exeter.  The idea originated from a Writing Retreat I attended, where the focus was to clear all distractions and use the time not just to write but to produce high quality output. This parallels many of the challenges that researchers face when writing software and the idea that there is no time to consider making the code nice or think about how someone else would use it. So it struck me that a Coding Retreat could be a mechanism to provide the discipline to promote good practise. 

Based on the Writing Retreat I attended, the premise was to develop a similar event providing researchers with the time and space to focus and prioritise writing high quality, sustainable software. The workshop was designed to be inclusive; open to any programming language, any discipline and any project. The only requirement was that attendees had a project to work on, either improving or finishing ongoing work or starting something new.

The day started with refreshments and an overview of what the day would involve. The primary objective of the workshop is to use the time, cleared from the usual daily distractions, to produce high-quality code putting into practice all the principles acknowledged…

Continue Reading

Docker Containers & Reproducible ResearchBy Raniere Silva, Community Officer, Software Sustainability Institute.

Last year, during the First Conference of Research Software Engineers, Iain Emsley, Robert Haines and Caroline Jay hit on the idea to organise a meeting about Docker and how researchers are using it. Ten months later, 60 researchers, developers and librarians met in Cambridge for the Docker Containers for Reproducible Research Workshop (C4RR).

The workshop consisted of one sponsored keynote by Microsoft, 20 talks and four lightning talks and participate in one of two demo sessions. There were many success stories involving containers and, when high performance computing (HPC) was involved, the use of  Singularity as a good alternative to Docker.

Introduction

If I had to select one talk from C4RR to summarise the workshop, my choice would be Building moving castles: Scaling our analyses from laptops to supercomputers by Matthew Hartley, et al. With some images from Hayao Miyazaki’s…

Continue Reading

CODATA-RDA Summer School in TriesteBy Mario Antonioletti, Research Software Engineer

Last week, thanks to the Software Sustainability Institute, I was lucky enough to go teach the R and SQL lessons from Software Carpentry at the The CODATA-RDA Research Data Science Summer School held near Trieste in Italy. The Summer School focuses on providing participants from all over the world, with six out of the seven continents represented, growing competence in accessing, analysing, visualising and publishing data. The School is open to participants from all disciplines and/or background from the sciences to humanities. The first week, which provides a basic framework, is followed by three applied workshops that focus on Extreme sources of data, bioinformatics, and IoT/Big Data analytics. So participants cover a lot of material over the two weeks of their attendance.

I was provided 1.5 days to do R and half a day to do SQL. This was done over three days. The typical modus operandi for the Carpentries is that you type and they follow, typing the same thing you do, as an instructor, to their own terminals. For R though, I previously found that it is hard to keep up and easy for the participants to lose contact with the commands…

Continue Reading

CythonBy Thomas Etherington, Senior Research Leader, Royal Botanic Gardens, Kew, and Software Sustainability Institute Fellow.

Getting code running fast enough to be useful is an important consideration for making software sustainable. For Python programmers, the Cython project provides an opportunity to speed up your Python code. As part of my Software Sustainability Institute Fellowship, I spent a couple of days learning about Cython from one of the lead developers, and I’ve summarised from my perspective when Cython could be a useful tool for others to explore.

My interest in Cython began when I looked into the code of a SciPy function and saw code that looked quite Pythonic, but clearly wasn’t actual Python code. It transpires that a lot of SciPy functions have been written using Cython, which is a language that can either: compile Python code directly to C, or wrap C or C++ code in Python, so that computational speeds associated with lower-level C programming can be leveraged from a higher-level Python programming interface. So while SciPy is one of my favourite Python packages, the code itself actually consists of “more than 200,000 lines of C++, 60,000 lines of C, [...] compared to about 70,000 lines of Python code…

Continue Reading

dependency-graph_0.png

By Blair Archibald, University of Glasgow.

Building reproducible research workflows can be a messy business: data comes from many sources, it may need formatting, combined with other data and analysed in some way. Luckily, there is a whole host of software tools available to help manage some of this complexity (and hopefully let you keep your sanity!). In particular, (GNU) Make is ideally suited for the purposes of producing reproducible workflows. To see why let's join the FAKE research group.

The FAKE Research Group

Welcome to FAKE, a data driven research group that makes heavy use of computational science to perform analysis for publication. Our first task is to get up to date with the current publication. Luckily our predecessor has left detailed written instructions of the data analysis workflow:

  1. Run the formatData.awk script over the raw data to generate a tabular formatted output for later analysis

  2. Use the summarise.R R script on the formatted data to create two new data sets: summarised-group-1 and summarised-group-2

  3. You can then use the plot.py Python script on the two groups to generate the plots for the paper: stats-average.…

Continue Reading