Software and research: the Institute's Blog

Privacy and Trust in IoT & Open DataBy Sinan Shi, University College London, David De Roure, University of Oxford, Nikoleta Glynatsi, Cardiff University, Emma Tattershall, Science and Technology Facilities Council, Andrew Landells, University of Southampton, Chris Gutteridge, University of Southampton, Gary Leeming, University of Manchester.

This post is part of the Collaborations Workshops 2017 speed blogging series.

Challenges of understanding risks of privacy within a socially connected infrastructure are not well understood and constantly changing. Personal information can be private but still be accidentally shared by others and made available more widely. One of the largest challenges for privacy is the lack of understanding of what that data could be used for now, and as more data are collected and made available future purposes become even more difficult to predict. Often, seemingly innocuous data sets can be used to derive more private data, such as the waking times and other habits of…

Continue Reading

Sharing code and data neuroscienceBy Stephen Eglen, Software Sustainability Institute's fellow, University of Cambridge.

Scientists are increasingly dependent on computational techniques to analyse large volumes of data. These computational methods are often tailored to the particular analysis in mind, and as such are valuable research outputs. Furthermore, unlike experimental techniques, computational methods can be easily shared. However, at least in neuroscience, computational methods are not routinely shared upon publication of associated manuscripts.

To improve this situation, we have worked with the editors of Nature Neuroscience to establish a pilot code review project. Once papers have been approved in principle for publication, authors can opt-in to the code review. The code (and data) will be checked to see if independent reviewers can reproduce key findings of the paper. The details of the code review process are outlined in theeditorial, and we have written a commentary to describe good practice for sharing of code and data. For example, we suggest the minimum requirement for sharing is that sufficient code and data be provided to regenerate a key figure/table of the paper. This follows the well-established requirements for…

Continue Reading

Arab poetry, use of SolrBy Swithun Crowe, Research Computing, University Library, University of St Andrews

This article is part of our series: A day in the software life, in which researchers from all disciplines discuss the tools that make their or someone else’s research possible.

Most of the data I work with is in XML format—Text Encoding Initiative (TEI)— either handwritten or edited using XForms: XML exported from other programs such as Zotero, or taken from third party web services, such as the Library of Congress authority files. To search these files, I use Apache's Solr document search engine, usually communicating with it via PHP's Solr extension. The source XML documents are transformed into a form which Solr can ingest using XSLT. 

The examples in this…

Continue Reading

Docker Containers & Reproducible ResearchBy Raniere Silva, Community Officer.

Docker Containers for Reproducible Research Workshop (C4RR) is only a month away, 27-28th of June 2017 at the University of Cambridge. This workshop offers many talks on the use of containers applied to improve reproducibility on desktop, cloud and HPC environments and some practical sessions.

For those interested in HPC, some talks will surely make the workshop worth for all our attendees, Michael Bauer's one about Singularity, Matthew Hartley's one about ways to make the transition from the desktop to the HPC smother and Jeroen Schot's one describing how the Dutch National e-Infrastructure is empowering containers.

Meanwhile, the talks from Nick James, David Mawdsley and Matthew Upson are aimed at attendees who are more interested in reproducibility. Nick will talk about an open source data analysis pipeline from the European Bioinformatics Institute that employs containers. If you are an R user and are looking for ways to use Knitr with Docker to make easy for your colleagues to reproduce your R Markdown documents, David's talk is for you. And Matthew will take the attendees through a journey…

Continue Reading

Software in engineering By Edward Smith, Institute’s fellow, Imperial College London.

As an engineer, software design concepts are not only familiar, they are central to the education we are forced to endure. These include standardisation, quality testing and the importance of outlining a clear specification. However, when it comes to software development, engineering academics seem to forget these principles; principles that shaped the industrial revolution and allowed us to engineer the modern world. In this short blog, I want to explore why academic engineers don't apply these best-practice concepts to software.

It is clear we are in the middle of a revolution; one which arguably will change the world more rapidly than the industrial revolution over a hundred years ago. Aside from the scientific developments, among them steam power, electricity and mastery of materials such as iron and steel, it was the methodologies forged during this period that were pivotal to the revolution. The key concept for mass production was the division of labour and automation, allowing much greater production by fewer people. In addition, the standardisation of parts allowed each person to specialise and optimise a given part fitted together by agreeing on the required interface between these parts.

Consider a car. Before the industrial revolution, a…

Continue Reading