Software and research: the Institute's Blog

Bioconductor conference – R-based and open-sourced

By Laurent Gatto, Software Sustainability Institute Fellow.

This past week saw the yearly Bioconductor conference  take place at the Dana-Farber Cancer Institute, Boston, MA. It started with a Developer Day on July 30th and continued with scientific talks and workshops until August 1st.

Bioconductor is an R-based open-source, open-development software project that provides tools for the analysis and comprehension of high-throughput genomics data. It was set up in 2001 by Robert Gentleman, co-founder, alongside Ross Ihaka, of R and is overseen by a core team based primarily at the Fred Hutchinson Cancer Research Center in Seattle, WA and by other members coming from a range of other US-based and international institutions.

A sprint to new materials for Software Carpentry

By Aleksandra Pawlik, Training Lead.

Nine people, two days, two venues, two proper Polish lunches, many pull requests and one tour in the super computer centre. This summarises the Software and Data Carpentry sprint in Krakow during which we created new materials for Software Carpentry and updated existing materials. The team in Poland was one of nineteen teams taking part in the Mozilla Science Lab global sprint on 22-23 July 2014.

The idea of the sprint was based on the Random Hack of Kindness approach in which the teams all around the world work on hacks 24/7 (due to the different time zones in which the work is completed). The teams in the Software Carpentry sprint were located in Australia, New Zealand, Europe, US and Canada. When the teams in Australia and New Zealand were finishing their day, they handed over to the European groups. Around lunchtime in Europe, the (early birds!) teams in the North America were starting to join in. All sites had webcams streaming live picture. We could see and talk to each other, and it was really motivating to see people around the world hacking away or writing new materials for lesson.

Running an unconference - top tips

By Simon Hettrick, Deputy Director.

Have you ever needed to answer a big question that's fuzzily defined and can only hope of being answered by combining the experiences and knowledge of a wide group of disparate experts?

Whenever this situation occurs at the Institute, we apply the perfect solution: an unconference (like our Collaborations Workshop). Rather than sitting through a dull series of presentations, the attendees at an unconference are in control of what they do and how the conference works. This makes the confernece adaptive, so that the shifting boundaries of fuzzily defined questions can be honed and narrowed down until a solution is found.

At a good unconference you could have around 100 people all firing suggestions at you across a huge range of topics and, somehow, you have to accommodate these ideas on-the-fly into a constantly evolving agenda. Fortunately, the organisation of an unconference can be made much easier if you prepare in advance and follow our top tips.

Oh research software, how shalt I cite thee?

Citation needed placardBy Mike Jackson, Software Architect

The Institute are firm believers in software citation. Citing software, directly or via its associated publications, provides deserved credit for those who develop this vital research infrastructure. In this blog post I look at some ways in which research software developers are helping to promote the citation of software, by making it easier for researchers to do this. That's another thing we are firm believers in, automating the grunt work of using and developing software to free up time for research...​

As part of recent open call collaborations with both BoneJ and QuBIc I was taken aback by how involved citing software could get. For example, BoneJ request that their journal paper is cited, but, depending upon the plug-ins and additional features used, there are other papers that also need to be cited. Likewise, the FSL software library request citation of one of their three overview papers. Again, depending upon the specific tools used, there are additional papers to be cited. For example, using QuBIC's FABBER tool, bundled in FSL, requires citation of one paper, though citing three is recommended.

Research software program at CANARIE leading the way for Canadian researchers

By Scott Henwood, Lead Software Architect, CANARIE.

As is the case with most countries, Canada does not have a clearly identifiable research software development community. The community certainly exists, but members tend to interact only within specific research disciplines and the population is fairly transient due to the project-based approach to research software development used today. As a result, opportunities for collaboration and software re-use are missed and we see the same functionality being developed over and over again by the different software teams. This is particularly true in the case of non-research facing support software including user authentication components, visualisation tools and digital infrastructure management.

Ultimately more time and money is spent on redundant software development, leaving less of both for research-facing software and for the actual research itself. To help alleviate this situation, CANARIE has established our Research Software Program.

Desert Island Hard Disks: David De Roure

You find yourself stranded on a beautiful desert island. Fortunately, the island is equipped with the basics needed to sustain life: food, water, solar power, a computer and a network connection. Consummate professional that you are, you have brought the three software packages you need to continue your life and research. What software would you choose and - go on - what luxury item would you take to make life easier?

Today we'll be hearing from David De Roure, Director of the Oxford eResearch Centre.

I've always felt that, should one ever find oneself in a fairytale "three wishes" scenario, then surely the first wish would be to be able to do magic. Hence my first software package must be a programming language with which I could do anything, and for me that is Lisp. To be specific, it would be the Scheme dialect, and the MIT compiler, but that might all depend how much I can grab in those precious seconds – "any Scheme will do" as someone once nearly sang. And given one Lisp I can make any other – or indeed any other language, of my own invention. Metalinguistic magic. Lisp was and is my first language and I believe my brain might be programmed in it.

I don't fix printers or do IT support - I'm a Research Software Engineer

By Gillian Law, Tech Literate.

Ashley Towers, Research Software Engineer at the University of Sheffield, explains why and how he won the job title he wanted. Job titles matter. Towers almost didn’t apply for his current job at the University of Sheffield – a job that he loves - because the title was all wrong.

"They advertised for a Computing Officer, and I probably wouldn't have paid any attention if I saw it on a job page, because it sounds like a IT Support, fixing-the-printers role. But I was fortunate in that my sister works for the University, and she sent me the advert."

The University was, in fact, looking for someone to buy or develop software for students in the School of Clinical Dentistry. Four years later, Towers has made the job his own, creating software that is vital to the dentistry students and brings great research possibilities – and has persuaded the School to change his job title to reflect it.

Towers is now a Research Software Engineer – a job title that the Software Sustainability Institute has been promoting since 2012.

First FAIRport-ELIXIR BYOD Workshop

By Alasdair J G Gray, Lecturer in Computer Science, Heriot-Watt University

At the end of June, a group of individuals from across Europe came together in Leiden for the first FAIRport-ELIXIR Bring Your Own Data (BYOD) workshop, which was also sponsored by the Dutch Techcentre for Life Sciences. None of us quite knew what would happen but we were all excited that such an event was taking place. The result was better than we expected.

This first BYOD workshop combined experts in Linked Data as well as in MycoBase and the Human Protein Atlas. The participants were evenly split between data providers with some, but not a lot of RDF knowledge, and trainers, who were experts in semantic web technologies. The workshop’s aim was to give the data providers a mix of tutorial and hackathon that would make their data available in a more accessible and reusable manner, based on the Data FAIRport initiative, and using RDF. The goal was to develop showcases that would demonstrate the added value of interoperable data to facilitate questions across multiple resources.

3D archaeology - now low-cost, high-volume and crowd-sourced

By Andrew Bevan, Senior Lecturer, UCL Institute of Archaeology.

This article is part of our series: a day in the software life, in which we ask researchers from all disciplines to discuss the tools that make their research possible.

Archaeologists have long had a taste for computer-based methods, not least because of their need to organise large datasets of sites and finds, search for statistical patterns and map out the results geographically. Digital technologies have been important in fieldwork for at least two decades and increasingly important for sharing archaeology with a wider public online. However, the last decade of advances in computer vision now means that the future of archaeological recording – from whole landscapes of past human activity to archaeological sites to museum objects – is increasingly digital, 3D and citizen-led.

Structure-from-motion and multi-view stereo constitute a bundle of ‘computer vision’ methods (‘SfM’). They are a form of flexible photogrammetry (the latter being a science with a much older pedigree) in which software is able to automatically identify small features in a digital photograph and then match these across large sets of heavily-overlapping images in order to reconstruct the camera positions from which these photographs were taken.

Automatic performance tuning and reproducibility as a side effect

By Grigori Furisin, President and CTO of international cTuning foundation.

Computer systems' users are always eager to have faster, smaller, cheaper, more reliable and power efficient computer systems either to improve their everyday tasks or to continue innovation in science and technology. However, designing and optimising such systems is becoming excessively time consuming, costly and error prone due to an enormous number of available design and optimisation choices and complex interactions between all software and hardware components. Furthermore, multiple characteristics have to be carefully balanced at the same time including execution time, code size, compilation time, power consumption and reliability using a growing number of incompatible tools and techniques with many ad-hoc, intuition based heuristics.

During the EU FP6 MILEPOST project in 2006-2009, we attempted to solve the above issues by combining empirical performance auto-tuning with machine learning. We wanted to be able to automatically and adaptively explore and model large design and optimisation spaces. This, in turn, could allow us to quickly predict better program optimisations and hardware designs to minimise execution time, power consumption, code size, compilation time and other important characteristics. However, during this project, we faced multiple problems.