Software and research: the Institute's Blog


By Blair Archibald, University of Glasgow.

Building reproducible research workflows can be a messy business: data comes from many sources, it may need formatting, combined with other data and analysed in some way. Luckily, there is a whole host of software tools available to help manage some of this complexity (and hopefully let you keep your sanity!). In particular, (GNU) Make is ideally suited for the purposes of producing reproducible workflows. To see why let's join the FAKE research group.

The FAKE Research Group

Welcome to FAKE, a data driven research group that makes heavy use of computational science to perform analysis for publication. Our first task is to get up to date with the current publication. Luckily our predecessor has left detailed written instructions of the data analysis workflow:

  1. Run the formatData.awk script over the raw data to generate a tabular formatted output for later analysis

  2. Use the summarise.R R script on the formatted data to create two new data sets: summarised-group-1 and summarised-group-2

  3. You can then use the Python script on the two groups to generate the plots for the paper: stats-average.…

Continue Reading

SupercomputingBy Weronika Filinger, Applications Developer, EPCC

This post was originally published in the EPCC blog.

Massive Open Online Courses (MOOCs) provide free web-based distance learning opportunities to large numbers of geographically dispersed students. Here at EPCC we are always keen to talk about supercomputing, and becoming involved in this MOOC was a natural development for us. 

Course structure and delivery  

In accordance with the MOOC methodology of presenting the content in small, easily digestible portions, we designed this course to last for 5 weeks. 

Each week has a distinct theme, and is further divided into smaller modules called ‘activities’, consisting of a number of ‘steps’. Steps are the smallest units of the course structure and, regardless of their type – article, video, discussion, exercise, quiz or test – should not require more than 20 minutes to complete. Learners can spend as much or as little time on each step as they wish, and do them at any time. In our estimate the week’s worth of content should not take more than 3 hours to complete. Learners are granted access to all of the material at once, which allows them to proceed at their own pace. 

Programme content

The first week provides a gentle introduction to the world of supercomputing, including some basic terminology, a brief historical overview and…

Continue Reading

coding for humanitiesBy Iza Romanowska, University of Southampton, and Software Sustainability Institute fellow.

It may be challenging to teach an old dog a new trick but to change him into a cat is a whole new level of difficulty. So when we embarked on an ambitious task to teach archaeologists to code a simulation, we knew we need to make an extra effort. How do you explain a while-loop to someone who has never seen a line of code? How do you discuss different testing paradigms when you know the main issue will be to get the code to run in the first place? How simple can you make a simulation without losing all of its functionality? These type of questions are best approached by diving into the deep end and running a training workshop on an unsuspecting sample of not-so-computationally-savvy-yet-quite-interested researchers. Here we report on an Software Sustainability Institute sponsored workshop and present a few lessons we have learnt on the way.

Archaeologists usually found knee-deep in mud or elbow-deep in medieval manuscripts are not known for their outstanding computational literacy. This translates into a limited use of many computational tools commonly employed in other disciplines. In particular, formal, computational modelling techniques, which require a high level of technical and mathematical skill such as simulation are severely…

Continue Reading

Advances in Data ScienceBy Raniere Silva, Software Sustainability Institute.

Manchester hosted the Advances in Data Science 2017 meeting organised by the Data Science Institute on 15-16 May 2017. It was an opening eyes meeting for privacy and inspiring for ways that researchers can analyse and visualise their data.

The meeting started with a talk by Mark Girolami covering the use case of inference and prediction of the London retail development. It was interesting to discover how retail development is important to plan the future of any city since it’s one of the main reasons why people travel across their cities. The introduction of drones and self-driven cars would completely change why and when we travel across our cities, which will create many opportunities for researchers in this area. Following Mark's talk, Raia Hadsell exposed ways to overcome catastrophic forgetting; i.e., when a machine learning entity forgets what it’s learnt when it starts learning  a new problem in neural nets. Raia used Atari Games played by an artificial intelligence…

Continue Reading

Code/Theory workshopBy Caroline Jay, University of Manchester, Robert Haines, University of Manchester.

A group of research software engineers (RSEs) recently gathered in Manchester, to explore the challenges of translating between scientific narrative and software. The full report from the Code/Theory Workshop is available in Research Ideas and Outcomes; here, we summarise the outcomes of the afternoon. Software engineers are sometimes seen as peripheral to the academic enterprise, providing the tool to do research, rather than actively contributing to the research itself. The overwhelming conclusion of the workshop was that, in reality, software engineers play a central role in the research process, and it is vital to get this message across.

Why is code/theory translation challenging?

Participants started by identifying the challenges of translating between code and theory. A key theme that emerged was the difficulty of designing research software. As scientific theory is continually changing, how do you design a plan?

All participants faced the challenge of getting to grips with new and diverse domains. In some…

Continue Reading