Software and research: the Institute's Blog

Research IT, Enterprise ITBy Laurence Billingham, British Geological Survey, David Golding, University of Leeds, Robert Haines, University of Manchester, Martin Hammitzsch, German Research Centre for Geoscience, James Hetherington, University College London, Simon Hettrick, Software Sustainability Institute.

This post is part of the Collaborations Workshops 2017 speed blogging series.

Universities need to strike a balance between risk and strategic opportunities (world-class research and world-class teaching). A semi-independent "sandboxed" service for research IT can deliver both, by isolating the stuff that needs to change fast from the stuff that needs to always work.

In mobile development, apps are "sandboxed" so that one app cannot break the phone. This analogy can work for services too. In research-led universities, we need…

Continue Reading

Seductive Data

By Eilis Hannon, University of Exeter, Martin Callaghan, University of Leeds, James Baldwin, Sheffield Hallam University, Mario Antonioletti, Software Sustainability Institute, David Pérez-Suárez, University College London.
 

This post is part of the Collaborations Workshops 2017 speed blogging series.

In our daily work we may, at some point, need to access data from third parties that we wish merge or compare with some data that we have generated or obtained. Invariably we may turn to Google to find pertinent data sources. Domain experts may be able to refer us to data sources or in part there are keywords that can unlock what you are trying to find on the web. Alongside, we can filter results using advanced Boolean operators.  In order to make sense of the results, we can consider a number of factors, such as top links and domains that are most relevant to the topic. For specific domains, there will be known and trusted data providers, e.g. the Gene Expression Omnibus (GEO) or the…

Continue Reading

Privacy and Trust in IoT & Open DataBy Sinan Shi, University College London, David De Roure, University of Oxford, Nikoleta Glynatsi, Cardiff University, Emma Tattershall, Science and Technology Facilities Council, Andrew Landells, University of Southampton, Chris Gutteridge, University of Southampton, Gary Leeming, University of Manchester.

This post is part of the Collaborations Workshops 2017 speed blogging series.

Challenges of understanding risks of privacy within a socially connected infrastructure are not well understood and constantly changing. Personal information can be private but still be accidentally shared by others and made available more widely. One of the largest challenges for privacy is the lack of understanding of what that data could be used for now, and as more data are collected and made available future purposes become even more difficult to predict. Often, seemingly innocuous data sets can be used to derive more private data, such as the waking times and other habits of…

Continue Reading

Sharing code and data neuroscienceBy Stephen Eglen, Software Sustainability Institute's fellow, University of Cambridge.

Scientists are increasingly dependent on computational techniques to analyse large volumes of data. These computational methods are often tailored to the particular analysis in mind, and as such are valuable research outputs. Furthermore, unlike experimental techniques, computational methods can be easily shared. However, at least in neuroscience, computational methods are not routinely shared upon publication of associated manuscripts.

To improve this situation, we have worked with the editors of Nature Neuroscience to establish a pilot code review project. Once papers have been approved in principle for publication, authors can opt-in to the code review. The code (and data) will be checked to see if independent reviewers can reproduce key findings of the paper. The details of the code review process are outlined in theeditorial, and we have written a commentary to describe good practice for sharing of code and data. For example, we suggest the minimum requirement for sharing is that sufficient code and data be provided to regenerate a key figure/table of the paper. This follows the well-established requirements for…

Continue Reading

Arab poetry, use of SolrBy Swithun Crowe, Research Computing, University Library, University of St Andrews

This article is part of our series: A day in the software life, in which researchers from all disciplines discuss the tools that make their or someone else’s research possible.

Most of the data I work with is in XML format—Text Encoding Initiative (TEI)— either handwritten or edited using XForms: XML exported from other programs such as Zotero, or taken from third party web services, such as the Library of Congress authority files. To search these files, I use Apache's Solr document search engine, usually communicating with it via PHP's Solr extension. The source XML documents are transformed into a form which Solr can ingest using XSLT. 

The examples in this…

Continue Reading