Software and research: the Institute's Blog

dorchard2_0.pngBy Dominic Orchard, Research Associate, Computer Laboratory, University of Cambridge and Institute Fellow.

The need for more rigorous software verification in science is well known. The presence of software errors can seriously undermine results leading to paper retractions, bad policy decisions, and even catastrophic technological failures.  Often the responsibility is placed on the programmer, but simply trying to be more careful is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences. In April 2016, we held a meeting in Cambridge on Testing and Verification for Computational Science to bridge the gap between computer scientists and natural scientists concerned with software correctness.

Before computer science became a discipline, computing existed mainly as a service to science, providing fast and accurate calculation. As computer science developed, it pulled in philosophy, logic, mathematics, semantics, and engineering but has largely moved away from its roots in science. This has lead to a gulf between computer science and the natural and physical sciences. While there have been many advances in program verification, little of this is applied to address the correctness of scientific software. This meeting was organised…

Continue Reading

By Simon Hettrick, Deputy Director.

I’m a lazy writer, so when it comes to summarising last week’s RSE Conference, I will defer instead to the genius of Adrian Jackson’s tweet:

With all the excitement about RSEs over the last couple of years, we knew it was the right time to run a conference to bring them together. We’ve had workshops and AGMs, but this was going to be bigger, better and way more intense. The thing that impressed me most was the buzz. We attracted a lot of new people, but they were interacting like old friends. We worked hard to have an inclusive event, but I think this is also representative of people feeling a part of the community. As one of the emails we received said:

“This might have been my 30th conference but it was the first where I felt thematically 100% at home and understood”.

Continue Reading

IHUB_2.pngBy Toni Collis, Applications Consultant

The Ionomics Hub (iHUB), a collaborative international network for ionomics, is a science community web portal that promotes knowledge extraction and reuse of ionomic data. The iHUB contains ionomic information on over 300,000 plant and yeast samples. Ionomics is the measurement of the total elemental composition of an organism which coupled to genetics provides a powerful tool to understand important biological processes and problems. A better understanding of the mechanisms regulating the ionome offers potentially new approaches to manipulate such agriculturally important traits as salinity tolerance and mineral nutrient efficiency to develop crop varieties that are more resilient to the predicted impacts of climate change on soil fertility. Further, it will allow improvement in crop yields in a more sustainable manner to deliver the yield gains required to meet future population growth. Together with the Software Sustainability Institute, David E Salt, the Director of iHUB and Professor of Genome-Enabled Biology in the Division of Plant and Crop Science based at the University of Nottingham wants to ensure that this valuable resource continues to operate and is maintained for years to come.

A major challenge to the reproducibility of science is the provenance of data. Sharing this data also…

Continue Reading

3D Reconstruction of NeuronsBy Paul Graham, EPCC and Software Sustainability Institute.

The Software Sustainability Institute have started a new project working with Colin Davis, Professor of Cognitive Psychology at the University of Bristol, and James Adelman, Associate Professor of Psychology at the University of Warwick, and their software easyNet. This is a computational modelling software package for cognitive science. It is a research tool used to better understand the mechanisms and codes underlying human cognition. By running of simulations of computational models, it is possible to generate predictions that can then be tested in behavioural experiments. So far the main interest has been in understanding reading, but research has also been conducted in speech perception and production, spatial cognition, memory and social cognition.

Computational modelling has played a critical role in the advancement of theory in cognitive science. However, the rate of theoretical progress has been hampered by a number of systemic issues relating to low levels of transparency, reusability, accessibility and reproducibility. The cognitive psychology research community is composed largely of non-modellers who refer to published models in their own empirical work, but do not directly…

Continue Reading

Square at Erlangen.by Raniere Silva, Software Sustainability Institute.

When EuroSciPy 2016 was announced, I told to myself that I need to attend it. The first reason was to compare it with SciPy Latin America 2016, whose organisation I helped with last March, and be able to provide suggestions to both events in 2017.

Both conferences are about the use of Python in science and received between 100 and 200 attendees from different countries. SciPy Latin America 2016 attendees complained about the four tutorial parallel track and I believe that, for a conference of this size, having only beginner and intermediate tutorial tracks, as done by EuroSciPy, is the right choice. EuroSciPy had the last day reserved for sprints, something that was cut from SciPy Latin America—and that can be improved if the organizers provide an agenda for it. SciPy Latin America had some swags for the attendees that I really missed on EuroSciPy.

Another reason that I wanted to attend EuroScipy 2016 was to promote the Software Sustainability Institute, Software Carpentry and Data Carpentry. I taught a Git Tutorial based on Software Carpentry material on the second day. The organisers told me that they received positive comments about the Git Tutorial—which made me happy! EuroSciPy also had some lightning talk…

Continue Reading

RSEHistory1.jpgBy Simon Hettrick, Deputy Director 

On a beautifully sunny day in March 2012, a small group met at Queen’s College Oxford and challenged a long-standing problem: why is there no career for software developers in academia? They didn’t know it at the time, but this meeting led to a nationwide campaign that created a vibrant and rapidly growing community, and established a new role in research: the Research Software Engineer.

The lack of a career path for academic software developers wasn’t new back in 2012, but it had gone largely unchallenged. Many academics were aware of the importance of software to research; they could see that the people who created this software went largely unrecognised, and they were beginning to worry about the consequences of this oversight. What happens when something is so vital to research, yet overlooked and severely under-resourced? Concerns like these were raised at our Collaborations Workshop, and this led the group to meet and challenge them.

A new role is born

The group that rose to the challenge consisted of Rob Baxter, Ian Bush, Dan Emmerson, Robert Haines, Neil Chue Hong, Dirk Gorissen, James Hetherington and Ilian Todorov (I missed this now-historic moment because I was running the conference). They realised that software developers lacked something more fundamental than just recognition—they lacked a name. In a short study in 2014, we investigated…

Continue Reading

Research Data Visualisation WorkshopBy Raniere Silva, Community Officer, Olivia Guest, University of Oxford, Vincent Knight,Cardiff University, Christina Bergmann, Ecole Normale Supérieure.

The Institute’s Research Data Visualisation Workshop took place on the 28th of July 2016 at the University of Manchester. Raniere Silva’s warm welcome was followed by Prof. Jessie Kennedy’s, from the Institute for Informatics and Digital Innovation at Edinburgh Napier University, keynote talk. Jessie spoke about the miscommunication of data due to poor visualisation techniques and how to avoid it. With over 50 attendees, the workshop provided an environment for learning and sharing. In the following sections, we will cover the events that took place during the workshop.

The Keynote

RDVW keynote

The Research Data Visualisation keynote talk was titled: ‘…

Continue Reading

BW1.jpgDr Becca Wilson, Software Sustainability Institute Fellow, Research Fellow, Data 2 Knowledge Research Group, University of Bristol

I attended the 2016 UseR conference at Stanford University 27th–30th June 2016. This year’s UseR Conference was of particular importance as it coincided with the 40th anniversary of the S statistical programming language (the precursor to R) and the 75th birthday of Professor John Chambers—a co-creator of S.  The conference was highly attended, with around 900 registered delegates split 50:50 across academia and industry.  Those unable to attend could follow #useR2016 on twitter and watch the stream from keynote talks live, all talks were recorded and are available online.

The opening keynote Forty years of S by Rick Becker, co-creator of S at Bell Labs in the 1970’s, was a nostalgic look back at the origins of the ‘S’ statistical language. Rick highlighted how far analytic processing has come—when in the 1970s batch computing was done via punch cards, processing for a regression analysis took two hours and then you had to wade through pages of print out to identify the solution. Ultimately R was released as an open source alternative to the licensed S, with much of its functionality retained from S including the use…

Continue Reading

recipy.jpg By Mike Jackson, Software Architect

A major challenge to reproducibility in computational science is the effort that is required to keep track of provenance and to make research that relies upon code more reproducible. recipy provides an almost effortless way to track provenance in Python. I am working with recipy’s developers—Software Sustainability Institute fellow Robin Wilson and Janneke van der Zwaan of Geography and Environment at the University of Southampton—to develop an automated test suite for recipy as a precursor to expanding the development of recipy and promoting recipy more widely.

recipy is an open source Python module package hosted on GitHub and released under the open source Apache License Version 2.0. It is available via this repository or as a Python package that can be installed via Python’s pip package manager. Once a researcher has installed recipy, all they have to do is add “import recipy” at the top of their Python scripts, and all of…

Continue Reading

A word cloud of the software used in researchBy Simon Hettrick, Deputy Director.

Over the last couple of years, we’ve had occasion to ask people about the software they use in their research. We’re about to start a long-running survey to collect this information properly, but I thought it might be fun to take a rough look at the data we’ve collected from a few different surveys.

It would be easy to survey people if there existed a super-list of all possible research software from which people could choose. But no such list exists. This raises the question of how many different types of software do we expect to see in research? Hundreds, thousands, more? The lack of this list is rather annoying, because it means we have to collect freeform text rather than ask people to choose from a drop-down list. Free-form text is the bane of anyone who collects survey data, because it takes so much effort to clean. It is truly amazing how many different ways people can find to say the same thing!

I collected together five of our surveys from 2014 to 2016, which relates to 1261 survey participants. From these, we collected 2958 different responses to the question “What software do you use in your research?”, but after a few hours of fairly laborious data cleaning (using Open Refine to make things easier) these were boiled…

Continue Reading