Building reproducible research workflows can be a messy business: data comes from many sources, it may need formatting, combined with other data and analysed in some way. Luckily, there is a whole host of software tools available to help manage some of this complexity (and hopefully let you keep your sanity!). In particular, (GNU) Make is ideally suited for the purposes of producing reproducible workflows. To see why let's join the FAKE research group.
The FAKE Research Group
Welcome to FAKE, a data driven research group that makes heavy use of computational science to perform analysis for publication. Our first task is to get up to date with the current publication. Luckily our predecessor has left detailed written instructions of the data analysis workflow:
formatData.awkscript over the raw data to generate a tabular formatted output for later analysis
summarise.RR script on the formatted data to create two new data sets:
You can then use the
plot.pyPython script on the two groups to generate the plots for the paper: