Skip to main content

Software and research: the Institute's Blog

So, continuing on from Part 1 of Developing software in an open way, let’s answer the last two aspects of the question from Alex Voss…

“Should I make all my source code available from the start to attract potential collaborators and to solicit contributions or should I keep it close initially to avoid getting locked into early solutions that have been taken up by others?”

You will have project goals to fulfil, so I think it’s always a good idea to start development internally. This means you can put your software firmly on the path to meeting your own project’s goals. When and how external involvement comes into play will depend on the project.

Firstly, it’s important to consider how you want to govern your open-source project. If you are worried about becoming locked-in to a solution chosen by others, you should stay away from the democratic/meritocratic systems, because you lose some control with these approaches (you can be outvoted). A benevolent dictator approach means that you alone have the final say on which contributions to include into your software – and which ones to exclude. You gain control, but benevolent dictators have to put in a lot of work…

Continue Reading

MeccanoTrain.jpgWell, OK, of course we're not. Most of us are researchers who want to get on and make new discoveries in our chosen fields. Increasingly we find ourselves having to create and use software to make progress, but that doesn't make us all software engineers. What we're doing is research - computational research, if you will - but not software engineering.

Hmm. Hold on a minute.

Research works best without constraints on thought; it needs the freedom to chop, change, freewheel and go off at tangents; it is, by definition, a voyage into the unknown.
 

In contrast, research software can't, doesn't and isn't.

Whatever the purpose of your software, its creation isn't a matter of research, it really is software engineering. A different shade, perhaps, but it's the same colour. We need to adopt the same approaches to research software as we do to any other kind. But fear not! A little can go a long way. Here are five key things that every computational researcher should understand about creating software.

  1. Requirements, requirements, requirements. Software tools are meant to do something, behave in a certain way - they have a goal. Writing these things down, however informally, is an important first step on the path to creating something sustainable. Spending time up-front thinking about, documenting and prioritising what your software should do will save pain later on. It needs to do this now (a…
Continue Reading

So I received this question from Alex Voss the other day:

“As I am just embarking on a software development project, I would like to know from Steve what the benefits and risks are of developing a piece of software in a completely open way? Should I make all my source code available from the start to attract potential collaborators and to solicit contributions or should I keep it close initially to avoid getting locked into early solutions that have been taken up by others? Are there examples of how people have gone about this that I might learn from?”

Certainly a challenging and quite wide-ranging question, but one that applies to many people who are considering open sourcing their software, so definitely worth a look. I’ll be answering this in two posts, so let’s take a look at each question in turn…

“What are the benefits and risks of developing a piece of software in a completely open way? “

By developing your software in an open way, you can build a community that will help to sustain your software beyond the original project. Open sourcing your software can raise its profile which can increase uptake, and it provides an organisational structure for developer contributions and feedback, which helps to improve it. A user community can also offer perspective and steer on your decisions. Focusing on the needs…

Continue Reading

LaptopAngel_0.jpgAt the Effective Scientific Programming workshop on the 20th June 2011, Mike Jackson posed to 31 attendees one of the essential programming questions, that of "what makes good code good?" The attendees, who mostly viewed themselves as "scientists who do some programming", rather than "scientific programmers" proposed the following:

  • The most important quality is that the code must be fit for purpose. It must do what it is intended to do and do it correctly. Without this quality, any research deriving from the code could be fundamentally flawed.
  • The code must be readable, well-commented and documented. The code may be worked upon by not just the original author, but others, for example research associates recruited onto a project. Similarly, the original author may need to return to the code after a long break.
  • Source code itself says what the code does and how it does it. However, it is also important that comments be added to include design rationale, or why the code is as it is, why certain implementation decisions were taken. This can be particularly useful when these decisions might otherwise seem unconventional or non-intuitive.
  • The code should be elegant and concise. There should be no redundant or repeated code. However, the code should not be so concise as to be so cryptic that it is very difficult to understand.
  • The code should be efficient and, where possible, optimise its usage of processor, memory and storage resources.…
Continue Reading

Tvitae.gifhe Software Sustainability Institute was invited to participate in a workshop on Effective Scientific Programming at the University of Newcastle on Monday 20th June 2011. Funded by the Vitae Yorkshire and North East Hub, the workshop was run by a team of researchers from Newcastle University and Northumbria University. The workshop allowed scientists to come together and share their experiences of programming and gain a greater awareness of tools and techniques that can help them become more effective programmers.

Over 70 scientists from universities across the North East attended. Attendees varied both in their experience of programming and the languages they used. MATLAB was the most popular, followed by FORTRAN, C/C++, R and Python, and there were also a handful of users each of Java, Visual Basic, SQL, Smalltalk and hardware definition languages.

At the outset, the workshop provided an introduction to networking, social not computer, and attendees were encouraged to practice this throughout the day. There followed sessions on how to choose programming languages, version control, visualisation packages, SQL and database programming, distributed programming, the problem of numbers, precision and rounding, and open-source copyright and licencing. The Institute's James Perry and Mike Jackson presented talks on…

Continue Reading