Why isn't there more open source in research?

OpenAndClosed2.jpgBy Josef Weinbub, Institute for Microelectronics, Technische Universität Wien

The simulation of semiconductor devices and processes in academia lacks support for the open-source movement. Consequently, simulation tools are implemented over and over again by different research groups, constantly reinventing the wheel and wasting time and effort that could be used to move the field forward. A reluctance to share code with the community further contributes to the problem. What's going on?

After three years of pursuing a doctoral degree in semiconductor device and process simulation, I came to the conclusion that researchers in this particular field are missing out on significant benefits that could be obtained by collaborative work on software using an open source model. There are several different groups around the world conducting research in this rather specialised field, and the majority of them develop and maintain their own in-house simulation code.

What has struck me is the reluctance within this community to share code. We need to move out of this pattern of secrecy to prevent unnecessary code re-implementation. It reduces the time spent on research, and ultimately diminishes the pace of research.

The are a number of  reasons for this secrecy. Researchers fear losing the opportunity to generate scientific output from their internally developed code. This is the typical fear of losing ground to competitors. There is a lack of time for releasing, maintaining, and supporting software, This is because academic software developers focus on generating publications, and there is simply not enough resources left for software related tasks. Journals typically prioritise scientific results over the software that generates them, which further devalues the notion of free open-source releases. Researchers are unfamiliar with the software release cycle, so the benefits of implementing such a cycle are unknown to them.

First let me clarify that when I refer to open source, I really refer to the universal term Free/Libre Open Source Software. As far as I am aware, there is only a limited amount of open-source software in my field: Archimedes, nanoHUB (around 10% of the tools are open source), ViennaSHE/ViennaMOS (from our institute), NanoTCAD ViDES, Genius Open (offers significantly reduced functionality with respect to the commercial version), and FLOOXS (last stable release 2008).

What can be done?

The importance and advantages of releasing software as open source has to be conveyed to researchers in our field. There is a good chance that the more hesitant or sceptic scientific developers rely on open-source tools. Why not give back to the community? We need more people who release their implementations as open source. We also need to support developers to uphold software quality. But first, we need to raise awareness: my field needs more open-source evangelists!

A controlled release-publication cycle can be employed to tackle the fear of losing scientific advance over competing research groups. By first publishing sensible results and then releasing the corresponding software, the publication and thus the scientific earning is secured. We need more support from the journal publishers and funding institutions. The Science Code Manifesto demands software access for the reviewers and journal readers. However, I would go even further: if there are no ties to licensing restrictions, why not force the scientists to release their code under open source? After all, the journal publishers might restrict access to the code anyway. Such a policy makes sense for projects funded by a public institution, like a government, because the source should be made available to the people that funded the development - the tax payers.

The UK funding scheme is picking up momentum, by institutions like the Joint Information Systems Committee. I hope that this leading example is picked up by other institutions across the world. After all, open source is a prerequisite for allowing us to rigorously satisfy one of the most fundamental principles in science: reproducibility. The ability to inspect code and execute simulations with the same data and parameters as published in research papers is paramount to uphold research quality.

Overall, the problems discussed in this post are surely not specific to the field of semiconductor device and process simulation. However, it is evident that the open-source movement is very young in this field and has yet to unleash its full potential.

Posted by s.hettrick on 21 January 2013 - 2:50pm

Submitted by m.jackson on 22 January 2013 - 11:28am

Permalink

There's another reason why researchers may be unwilling to share code and that is the fear that they'll be judged on the quality of the code and be found to be lacking. The article Why Librarians Don't Share Code refers to a survey in which "perfectionism" is the primary blocker to sharing. Victoria Stodden, in The Scientific Method in Practice: Reproducibility in the Computational Sciences, finds that the top blocker is the closely-related "time to clean up and document". Nick Barnes, director of the Climate Code Foundation, discusses this aspect in an aptly titled article in Nature, Publish your computer code: it is good enough, and comments "if your code is good enough to do the job, then it is good enough to release - and releasing it will help your research and your field."

Add new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.