The danger of digitizing old practices

Posted by s.sufi on 11 February 2016 - 1:57pm

A fast speed blog from the Fellows 2016 inaugural meeting produced while discussing 'Which commonly held ideas in research software are impeding progress and need to be retired?'

by Melodee Beals, (Loughborough University), Vincent Knight (Cardiff University), Neil Chue Hong (University of Edinburgh) and Jon Hill (University of York).

In this tweet the idea of sharing code as a sufficient and beneficial practice in reproducible research is exposed:

“You can download our code from the URL supplied. Good luck downloading the only postdoc who can get it to run, though #overlyhonestmethods”

In the past, research methodology (or equivalently code) would be shared in a detailed fashion (by travel from one institution to the other for example or - for the last 350 years at least - in journals or letters). In the modern ‘sharing’ age brought to us by the internet, it is falsely believed that disseminating code is a step forward in reproducible research. We believe that not only is it insufficient but in fact could have a damaging effect on overall research quality.

Tantalising tables, vapid visualisations

Take the example of images. In print, where space is at a premium, graphs, tables and other visualisations can condense complicated evidence into a digestible impression. The move to digital publication offered opportunities to not only expand the number of visualisations, but to fundamentally transform them. Rather than static images, Flash and Javascript allowed moving, even interactive, representations of data. However, the underlying data behind these dancing images remains largely obscure. A beautifully rendered diagram of 17th-century kinship networks remains an implicit expression of the researcher’s methodology, assumptions and interpretations. These are drawings, not evidence. In many cases, their creation is actually counter-productive; time is spent in producing eye-catching images rather than presenting a detailed account of the methodology or the data itself. Restricting ourselves to making space-saving illustrations ‘digital’ is the equivalent of inventing Hoe’s rotary press and then only publishing large-print books.

Another example of the old culture not correctly translating to the age of digital transmission of ideas is the humble PDF. The PDF (Portable Document File) allows text to be formatted and displayed as the author intended, including figures and tables. However, once in this format that link between the results and interpretation of the data to the underlying data is lost. PDFs are notoriously difficult to mine, with publishers using a variety of methods to display figures. Some add a vector graphic - great! We can extract that and probably back out the data. Some convert the figure to a bitmap image embedded in the file - impossible to back out data. The same with tables; the link to the raw data is lost.

Going back to our original example, even for software we are in danger of creating “digital friendly” practices which are worse than their analogue counterparts. We have gone from meticulous notebooks, keeping all our research in one place, to a messy desktop of poorly curated Word documents, missing methodology and filename metadata. The opportunities for automating the process of recording our research have not been fulfilled, instead leaving us with a situation where information is “shared” through out-of-date websites and undocumented zip files. We might as well have printed out our data, stuck it in an envelope and posted it to a random address.

Culture shock

So what can we do about this? The overwhelming issue is one of culture. It is difficult to overturn years of working methods. The first thing is to train people. We need to start ‘em young! Courses for undergraduates on digital data curation, software development and digital collaboration are key. This training needs continuous reinforcement throughout an academic career from PhD all the way to professor as the tools and methods update. As part of this culture shift, the right incentives are required at an institutional to governmental level via existing mechanisms like REF and promotion criteria. On a more day-to-day level the linking of data, code and the final product - the paper - is definitely required. It needs to go beyond the supplementary information to truly linked data, metadata and code. This may naturally happen as part of the open science and open access movements, but this will require more engagement in all disciplines where open research is not an accepted priority.

Research can benefit hugely from the advances in digital scholarship and development. However by simply copying over “traditional”, familiar practices without considering how they should be updated to suit a digital world we are simply consigning the next generation of researchers to a dead end littered with the zombies of ancient research methods.