Software and research: the Institute's Blog

Interpretative research

By Heather Ford, University of Leeds.

My research involves the study of the emerging relationships between data and society that is encapsulated by the fields of software studies, critical data studies and infrastructure studies, among others. These fields of research are primarily aimed at interpretive investigations into how software, algorithms and code have become embedded into everyday life, and how this has resulted in new power formations, new inequalities, new authorities of knowledge [1]. Some of the subjects of this research include the ways in which Facebook’s News Feed algorithm influences the visibility and power of different users and news sources (Bucher, 2012), how Wikipedia delegates editorial decision-making and moral agency to bots (Geiger and Ribes, 2010), or the effects of Google’s Knowledge Graph on people’s ability to control facts about the places where they live (Ford and Graham, 2016).

As the only Software Sustainability Institute fellow working in this area, I set myself the goal of investigating what tools, methods and infrastructure researchers working in…

Continue Reading

citing softwareBy Will Usher, Senior Researcher: Infrastructure Systems Modeller, University of Oxford

Plagiarism is a serious issue, and we are all familiar with the horror stories of students unceremoniously ejected from courses for copying essays. Any undergraduate degree worth its salt teaches students how to cite work correctly, acceptable bounds on quotation and how to attribute ideas and concepts to their sources. But in the growing world of open-source research software, best practices have yet to be universally understood, as I recently found out.

During my PhD at University College London, I became involved in the heady enthusiasm of the Research Software Programming group, attending and then helping out at Software Carpentry workshops. As a consequence, I was keen to apply my new knowledge of Python, version control and software development to my research. As luck would have it, I discovered an existing Python library on Github, which implemented several Global Sensitivity Analysis routines I could make use of. As I used the library, I started adding bits and pieces, and so by the end of the PhD I had made a considerable contribution to the package.

It's probably safe to say that SALib (sensitivity analysis library) is the go-to Python library for the unfortunately still-far-too-niche use of global sensitivity analysis in modelling, and our…

Continue Reading

Easy statistics, bad science, pre-registration, analytical flexibility

By Cyril Pernet, University of Edinburgh

I attended Brainhack Global in Warwick on 2nd and 3rd March 2017. On this occasion, I was invited to give a talk on reproducibility and listen to Pia Rothsteins talk about pre-registration. These somewhat echo two papers published earlier this year: "Scanning the horizon: towards transparent and reproducible neuroimaging research" by Russ Poldrack et al. in Nature Neuroscience and "A manifesto for reproducible science" by Marcus Munafò et al. in Nature Human Behaviour. Here, I’ll talk about these two concepts that are 'new' because of the ease with which computers can nowadays…

Continue Reading

Practice of Reproducible ResearchBy Justin Kitzes, University of California, Berkeley

We are very happy to announce the launch of our open, online book The Practice of Reproducible Research, to be published in print by the University of California Press later this year. In short, this book is designed to demonstrate and teach how research in the data-intensive sciences can be made more reproducible. The book centres on a collection of 31 contributed case studies, in which experienced researchers provide examples of how they combined specific tools, ideas, and practices in order to improve the reproducibility of a real-world research project. These case studies are accompanied by a set of synthesis chapters that introduce and summarise best practices for data-intensive reproducible research.

Within the overall context of reproducibility, our book focuses specifically on the goal of achieving computational reproducibility in individual research projects. We defined a research project as computationally reproducible if a second investigator can recreate the final reported results of the project, including key quantitative findings, tables, and figures, given only a set of files and written instructions. This focus reflects our belief that computational reproducibility forms a first and most foundational goal for individual investigators interested in the broad goals of reproducible…

Continue Reading

Participation in distributed open source projectsBy Mario Antonioletti, Software Sustainability Institute, Nikoleta Evdokia Glynatsi, Cardiff University, Lawrence Hudson, Natural History Museum, Cyril Pernet, University of Edinburgh, Thomas Robitaille, Freelance.

Running an open-source project with geographically distributed participants is a substantial undertaking. Attracting and retaining participants can be hard. Communicating project progress via social media, websites and mailing lists can take a lot of time, as does organising and running regular meetings since contributors are typically separated both by geography and time zones. It is important to identify mechanisms for (i) attracting participants (social aspect), (ii)  communications and development processes (which can encompass workflows, standards, conventions, project administration) and (iii) remote collaboration. In this blog post, we summarise some social and technical challenges and…

Continue Reading