Software and research: the Institute's Blog

Latest version published on 10 April, 2018.

ODDmap.png

By Peter Murray-Rust, ContentMine Ltd; Rachel Spicer, EMBL-EBI, University of Cambridge; Josh Heimbach InterMine, University of Cambridge; Yo Yehudi, InterMine, University of Cambridge and Code is Science; Naomi Penfold, eLife

Image to the right: Bike thefts in Cambridge over 2017. Rendered by Rachel Spicer, using R (ggmap) + Google Maps + Open police data

Open Data Day (ODD) is an international event that runs on the first Saturday of March, started in 2010 and supported by Open Knowledge International. It aims to raise the profile of all types of open data, from government to research.

Creating our own ODD

The Open Data Day organiser’s guide recommended picking a focus. We didn’t have a huge amount of time to organise, and we knew this wasn’t going to be a large event but mainly a motivation to meet some busy…

Continue Reading

Latest version published on 6 April, 2018.

16121281031_5b58dfc131_z.jpgBy Laura Fortunato, University of Oxford

Reproducible Research Oxford is a project based at the University of Oxford, launched in October 2016. The project aims to lay the groundwork for a culture of research reproducibility across the University, focusing on training in the effective use of computational tools in research. These tools are widely used in some disciplines, and they can enable researchers to easily track the process leading from data to results, so that it is fully reproducible. However, researchers often lack the opportunities, incentives and confidence to make best use of these tools.

As part of the project, we have set up a partnership between the University and Software and Data Carpentry, non-profit volunteer organisations focused on teaching researchers across disciplines the computing and data skills they need for effective and reproducible research. Since the start of the project, we have ran four Software Carpentry workshops, one Data Carpentry workshop—the first to be held in Oxford!—and we have hosted the first Oxford-based Software/Data Carpentry instructor training. So far, we have provided training to upwards of 100 learners from across the University who attended our workshops, in addition…

Continue Reading

Latest version published on 6 April, 2018.

8419988105_367cb3d1f8_z.jpgBy Matt Archer, Paul Brown, Stephen Dowsland, David Mawdsley, Amy Krause, Mark Turner (order is alphabetical).

So… you’ve just started on an exciting new data science project, but you know nothing about the domain you’re working on. Besides briefly panicking, how do you get up to speed on the area you’re working on?

First thing's first...it's good to meet the researchers you'll be working with as quickly as possible. Most researchers are excited about their research; this enthusiasm is infectious. Ask questions. Be interested.

To get a basic grounding in your new area, YouTube is an invaluable source of quick bursts of domain knowledge for both a general subject area or the detailed specifics and intricacies of a niche within that subject area. Video tutorials can take many forms but the useful ones to look for are short explainers on concepts or tooling, as well as longer form recordings of things like lectures, workshops and panel discussions. YouTube has become a primary method of user training materials for large software vendors, there are thousands of video tutorials on how to use tools or perform specific actions for things like Jupyter Notebooks, Excel and Adobe Photoshop. If there are large commonly used pieces of software in the domain you’re trying to learn, there may be similar videos available to help get started with that software platform.

It can be useful to ask for a background reading list from the researchers you're working with. Selectively…

Continue Reading

Latest version published on 5 April, 2018.

8236647979_efbfd1d409_z.jpgBy Matthew Archer, Stephen Dowsland, Rosa Filgueira, R. Stuart Geiger, Alejandra Gonzalez-Beltran, Robert Haines, James Hetherington, Christopher Holdgraf, Sanaz Jabbari Bayandor, David Mawdsley, Heiko Mueller, Tom Redfern, Martin O'Reilly, Valentina Staneva, Mark Turner, Jake VanderPlas, Kirstie Whitaker (authors in alphabetical order)

In our institutions, we employ multidisciplinary research staff who work with colleagues across many research fields to use and create software to understand and exploit research data. These researchers collaborate with others across the academy to create software and models to understand, predict and classify data not just as a service to advance the research of others, but also as scholars with opinions about computational research as a field, making supportive interventions to advance the practice of science.

Some of us use the term "data scientist" to refer to our team members, in others we use "research software engineer" (RSE), and in some both. Where both terms are used, the difference seems to be that data scientists in an academic context focus more on using software to understand data, while research software engineers more often make software libraries for others to use. However, in some places, one or other term is used to cover both, according to local tradition.

What we have in common

Regardless of job title, we hold in common many of the skills involved and the goal of driving the use of open and reproducible…

Continue Reading

Latest version published on 5 April, 2018.

253580496_491d04cc53_z.jpgBy R. Stuart Geiger, Alejandra Gonzalez-Beltran, Robert Haines, James Hetherington, Chris Holdgraf, Heiko Mueller, Martin O'Reilly, Tomas Petricek, Jake VanderPlas (authors in alphabetical order)

Data and software have enmeshed themselves in the academic world, and are a growing force in most academic disciplines (many of which are not traditionally seen as "data-intensive"). Many universities wish to improve their ability to create software tools, enable efficient data-intensive collaborations, and spread the use of "data science" methods in the academic community.

The fundamentally cross-disciplinary nature of such activities has led to a common model: the creation of institutes or organisations not bound to a particular department or discipline, focusing on the skills and tools that are common across the academic world. However, creating institutes with a cross-university mandate and non-standard academic practices is challenging. These organisations often do not fit into the "traditional" academic model of institutes or departments, and involve work that is not incentivised or rewarded under traditional academic metrics. To add to this challenge, the combination of quantitative and qualitative skills needed is also highly in-demand in non-academic sectors. This raises the question: how do you create such institutes so that they attract top-notch candidates, sustain themselves over time, and provide value both to members of the group as well as the broader university community?…

Continue Reading