Software and research: the Institute's Blog

Scientific Data Analysis with Java: DAWN

By Steve Crouch, Devasena Inupakutika, Alun Ashton, Mark Basham and Matthew Gerring

Scientific projects are often created as stand alone applications which use their own definitions for algorithms and visualisation tools. This makes it difficult to benefit from other people's work. The DAWN Science project allowed a large group of scientific developers and software engineers to  collaborate by developing a single, general purpose API to allow access and sharing of existing algorithms and visualisation tools. This significantly accelerates the development of new analysis tools. We reviewed the DAWN code and provided advice on how to improve the organisation of the software and sharing of the code. 

DAWN (Data Analysis WorkbeNch) is open-source scientific data analysis software for numerical data built on the Eclipse/RCP platform. It is developed by a collaboration of facilities and universities, some of whom are contributing code or development effort and others who use and test the software. The collaborative development is led by Diamond Light Source which is situated at the Rutherford Appleton Laboratory Campus near Oxford. Diamond is not restricted to a single scientific domain, so the software must cover a wide range of uses, from specialist capability like calibration and data reduction for diffraction equipment, to general capability like peak fitting and and integrated Python development environment including interactive tools such as plotting.

BioJS - free bioinformatics visualisation tools get a software facelift

By Devasena Inupakutika, Software Consultant.

With the advent of data-driven research in the life sciences, researchers have relied on data visualisations to generate hypotheses. Many bioinformatics services providers, such as EMBL-EBI or the NCBI, provide a browser-based environment to do this, as well as new ways to visualise biological data. It is important that the software is both high quality and user friendly, which helps researchers compare and contrast, as well as develop, well grounded conclusions. The Software Sustainability Institute worked with BioJS to review their code, help with coding standards - ultimately making it easier to develop with BioJS.

BioJS, a multi-partner effort coordinated by TGAC, provides services such as infrastructure, guidelines and tools, to represent biological data on the Web that can be reused by anyone. It is an open-source, community-based project, with a modular, structured design that is ideal for data-intensive research. It allows users to build reusable, interactive applications which can be easily deployed on the web.

Harnessing digital technology for health behaviour change

By Bob Patton, Lecturer in clinical psychology, University of Surrey.

UCL Centre for Behaviour Change (CBC) Conference 2015 was a two-day conference held at Senate House (London) to bring together experts from behavioural science, computer science, engineering and  human/computer interaction. The primary key note presentation was from Professor Bonnie Spring of Northwestern University, who discussed how an over reliance upon technology focused solution can be de-motivating, lead to reduced self efficacy and higher attrition rates from treatment programmes.

In the context of Precision Medicine – a term used to describe treatment applied to “the right patient, in the right place, at the right time” – we should be seeking to optimise our interventions, rather than to take a scatter gun approach and throw everything (including the kitchen sink) at trying to change behaviour. Perhaps the smartest thing that was said was the “ Widgets don’t in themselves change behaviour; its the underlying principals that count”. As an example Prof. Spring demonstrated a successful intervention using an old palm pilot (i.e. no graphics, limited functionality). The lesson here is to pay attention to the function  - there is a lot of robust theory relating to behaviour change, and we should try to use this in our attempts to digitise successful real-world applications.

Irreproducible research - some top tips

By Neil Chue Hong, Director.

Comic number 1869 from PhD Comics. (c) Jorge Cham. Used with permission.

The Software Sustainability Institute is proud to be associated with a major new paper on irreproducible research. The new paper is called "Top Tips to Make Your Research Irreproducible" by Neil Chue Hong, Tom Crick, Ian Gent and Lars Kotthoff, and is due to be published today (1 April) on arXiv. We present some excerpts of the paper with permission of the authors. Readers are encouraged to read the full version.

We have noticed (and contributed to) a number of manifestos, guides and top tips on how to make research reproducible; however, we have seen very little published on how to make research irreproducible.

It is an unfortunate convention of science that research should pretend to be reproducible; our top tips will help you salve the conscience of reviewers still bound by this fussy conventionality, enabling them to enthusiastically recommend acceptance of your irreproducible work.

By following our tips, you can ensure that if your work is wrong, nobody will be able to check it; if it is correct, you can make everyone else do disproportionately more work than you to build upon it. In either case you are the beneficiary.

  1. Think “Big Picture”. People are interested in the science, not the experimental setup, so don’t describe it.
  2. Stay high-level. Pseudo-code is a great way of communicating ideas quickly and clearly while giving readers no chance to understand the subtle implementation details that actually make it work.
  3. Short and sweet. Any limitations of your methods or proofs will be obvious to the careful reader, so there is no need to waste space on making them explicit.
  4. The deficit model. You’re the expert in the domain, only you can define what algorithms and data to run experiments with.
  5. Don’t share. Doing so only makes it easier for other people to scoop your research ideas, understand how your code actually works instead of why you say it does, or worst of all to understand that your code doesn’t work at all.

Our most important tip is deceptively but beautifully simple: to ensure irreproducibility of your work, make sure that you cannot reproduce it yourself. If you were able to reproduce it, there would always be the danger of somebody else being able to do exactly the same as you.

Scholarship in software, software as scholarship: a view from the humanities

By James Baker, Curator, Digital Research, British Library @j_w_baker

There are complex challenges in the humanities around software sustainability. For if it is true that humanists rely on software to do research, and increasingly software developed by their community, many if not most do not value the use of software and their nascent systems of credit for good software development and reuse are fragile. And so if the humanities are to make the best of the vast and growing digitised and born-digital corpora held by research libraries, key stakeholders in the field must ascribe the same value to the development of and experimentation with research software as they do to traditional practices such as literature surveys, source critique, and written publications.

In order to deepen my knowledge of these challenges and opportunities, I recently attended Scholarship in Software, Software as Scholarship: From Genesis to Peer Review - a two day meeting at Universität Bern that brought together an international audience of scholars, developers, funders, and associated individuals to consider the status, role, and assessment of software in humanities research. The discussions that interspersed the scheduled short papers, keynotes, and round table were varied, fluid, and expansive in character - indeed even the utility of wissenschaft as a term capable of overcoming the Anglophonic divide between 'the sciences' and 'the humanities' was addressed. Nevertheless, three themes were prominent: theory, community, and practice.

Is software a method?

By Philip Fowler, Software Sustainability Institute Fellow and postdoctoral researcher at the Department of Biochemistry at the University of Oxford.

Last month I attended the 59th annual meeting of the US Biophysical Society. It's the sixth time I've been but the first time I've gone really thinking about how our community treats software under my remit as a Software Sustainability Institute Fellow. I was also chosen as a guest blogger and you can see my posts on my blog or on the Biophysical Society blog.

Wait... what is biophysics?

It is the application of physical methods (and to a lesser extent, theories) to biology, focusing mainly at the molecular level. This includes determining the structure of proteins (and more famously, DNA) by illuminating a protein crystal with X-rays, recording the diffraction pattern and then inferring the complex 3D structure that would be responsible for that pattern. It is a mature discipline: the first conference of the society was in the 1950s and now it attracts over 6,000 scientists every year.

Software Management Plan Service prototype live

Software management plan guide and service

By Mike Jackson, Software Architect.

Software management plans set down goals and processes that ensure software is accessible and reusable throughout a project and beyond. To complement our guide on Writing and using a software management plan we have now developed a prototype software management plan service, powered by the Digital Curation Centre's data management plan service, DMPonline.

Releasing data service software as free open source software

Reflections of the same thing

By Mike Jackson, Software Architect.

Linked data is a way of representing and joining information from a variety of sources to allow it to be accessed, browsed, searched and used as easily as one would browse the web. One of the principles of linked data is that URIs are used to name things whether these be people, places, books, software, magazines, departments, machines and so on.

As anyone can develop their own linked data sets, and propose their own URIs, many URIs may be created for the same thing. sameAs.org is a service offered by Seme4 Limited that allows users to find out which URIs refer to the same thing. sameAs Lite is a refactored, open source, version of the software that powers sameAs.org. We are providing consultancy to Seme4 on how to improve sameAs Lite for deployers and developers and to promote community engagement.

Google Code is shutting down - what should you do next?

By Neil Chue Hong, Director of the Software Sustainability Institute.

Google announced today that their open source project hosting site, Google Code, is to close. The site has disabled the creation of new projects, will turn read-only on 24 August 2015, and will close on 25 January 2016. In the announcement.

Google's Director of Open Source Chris DiBona cited the move of projects away from Google Code to other services such as GitHub and BitBucket - indeed Google itself has moved thousands of its projects to GitHub.

The first thing to stress is: don't panic. Google has provided a long time for you to migrate your project, along with tooling to make the process easier.

What the Flip? Getting girls to code through games

By Kate Howland and Judith Good, Department of Informatics, University of Sussex

This article is part of our series Women in Software, in which we hear perspectives on a range of issues related to women who study and work with computers and software.

With calls for all UK children to learn computer science from a young age, we need teaching methods and tools which can help novice programmers to learn in a way which both motivates and is more accessible for them, and which builds on their existing skills and interests.

The Flip programming language teaches coding by setting a task for users to creae a narrative-based 3D role playing game. An evaluation study suggests that girls match and in some cases exceed boys’ performance with the language, which is encouraging given concerns about the underrepresentation of women in the technology industry.