Research Software in Germany—a brief report on efforts in autumn 2016

Participants of #hgfos16By Stephan Janosch, Research Software Engineer at Max-Planck-Institute for Cell Biology and Genetics, Dresden

RSEs in Germany

A handful of people from Germany attended the first Research Software Engineers conference #RSE16. However, few as they may have been, they made a plan: to transfer the community spirit among research software engineers from the UK  back to Germany. After some discussion, we decided  to register the domain http://www.de-RSE.org and set up a website and a mailing list.

Once the mailing list was online, a big surprise was posted within a few days: a free open science workshop for 70 people on scientific software would take place on November 2016 in Germany. Now, that would be the perfect chance to kick start a German RSE community, wouldn’t it?

Workshop—“Access to and reuse of scientific software”

November was upon us faster than expected, and so was the 1.5-day workshop (hashtag: #hgfos16) about accessibility and reuse of scientific software, organised by the Helmholtz Open Science office. An audience of 77 people, as diverse as that at #RSE16, listened to three keynote speakers (slides).

Johannes Köster identified two major challenges in science at the moment. He proclaimed an automation crisis which expresses a disconnect between software and data, resulting in too much manual labour.  This, in turn, limits the scalability of project volume (a possible solution would be workflow management). The second crisis concerns software quality and is caused by short-term projects and ever-changing staff, which results in missing tests and low maintainability. Johannes also hinted at the causes for low reusability being unknown licences, low efficiency and unfortunate software architecture. For further details, please see the slides.

Sünje Dallmeier-Tiessen presented some of the data preservation and analysis solutions tried at CERN. She highlighted policies that led to a virtual machine image (CERNVM) containing the software environment needed to download and execute the data analysis tools.  The tools are stored and maintained centrally by software librarians. The keynote also mentioned opendata.cern.ch as a repository for citable datasets as well as Zenodo for citing software. The CERN Analysis Preservation project, developed right now, aims at linking data to software during the research cycle. For further details, please see the slides.

Matthias Katerbow, Programme Officer at the DFG (a national science funding organisation), named four fields leading to software sustainability: reusability, reconstructability, infrastructure and reproducibility. He also maintained that citability (cf. Force 11 software citation principles) as well as finding long-term solutions for the maintenance of software still pose challenges. He closed with a description of current funding opportunities. For further details, please see the slides.

Participants of #hgfos16
Participants of #hgfos16, source: HZDR

The audience then split up into sessions dealing with nine topics, which I’d summarise as:

Business models: Usually scientific software is never sold, so it’s not a product that generates income and funds its maintenance. Sometimes business licenses provide resources for further development. Software development has to become a part of scientific best practices with all its consequences.

Reproducibility: Reproducibility creates trust in software-based science. Open source software —including workflows—in long-term archives (librarians come into play here) become citable and therefore linkable to data and papers (journals should demand proper software packages). Documentation standards might be enforced by policies which of course need support to be realised.

Technical infrastructure: Starting out modestly, with Software Carpentry and versioning in GitLab, is a good idea but, depending on the requirements the situation, this can get very complex (HEPData from CERN). For collaborations, GitHub is the gold standard right now. Sometimes you need to have code and data repositories as well as computing resources close together (e.g. for anything that requires large input data; think deep learning).

Licensing: This topic is often addressed too late in the process of publishing scientific software because of a lack of legal knowledge. Internal training and documented best practices in conjunction with checklists prove valuable. Having clear points of contact per institution or communication to trained lawyers is considered essential.

Visibility and modularity: Missing software directories, or the lack of awareness of their existence, leads to duplicate funding and to reinvent the wheel ( communication gaps). Extensible and highly usable frameworks encourage development of new modules or reuse of existing ones, but these frameworks need communities for maintenance.

Standards and quality control: The scope and usage intent define the needed level of software quality, but how about minimal standards and who enforces these (in-house processes, journals)? Less staff fluctuation and software reviews improve situations; best practises exist but need to be communicated and applied.

Personnel and careers: Curricula need to produce digital skill sets. Practical sessions (hackathons, hacky hours) raise the motivation for continuing education. A career path for RSEs is needed, otherwise scientists themselves would need to focus on software development.

Citation and rewards: Software is not data. Citable software makes developers visible and thus justifies their funding. Journals can play a more important role here. A career path for RSEs would be a perfect reward.

During these discussions, the call for a German Software Sustainability Institute came up multiple times. And it seems that efforts, experiences and best practises exist, they “just” need more communication. Also, the term “community”, either as an existing community or wishful thinking (e.g. a community could maintain their own tools), was often mentioned.  As the workshop topic was sustainability of software, there was only one small effort made to connect the people behind the software in the form of another mailing list.  Yes, you can read some disappointment between the lines here, because my expectations were a little bit different—how about an annual RSE conference in Germany organised by the http://www.allianzinitiative.de?

One thing is for sure: we need to have a German RSE community to share experiences and facilitate communication to help and benefit from each other!

Posted by s.aragon on 8 December 2016 - 4:23pm