The SSI Blog

Evaluating the software behind some of the world's large experimental facilities

RutherfordAppletonLabs.jpgThe Software Sustainability Institute has made an assessment of an international Google Code project led by the Rutherford Appleton Laboratory in Oxfordshire. A report based on the findings of the assessment is now being discussed with the project’s senior collaborators in the US and France.

Ten years ago, the Scientific Technology Facilities Council (STFC) started work on a catalogue for data collected from experiments at its laboratories at Harwell and Daresbury. The software, known as I-Cat (Information Catalogue), is now used by the Diamond Light Source, the Central Laser Facility, and the ISIS Neutron and Muon Source (which is related to the MAUS project).

Every day I-Cat is used to catalogue millions of files. Other large experimental facilities have adopted the software, such as ILL in Grenoble, France, SNS in Tennessee, USA, and Elettra in Trieste, Italy. Facilities in Australia, Spain, Switzerland and Germany are considering I-Cat for use with their work.

"It is good to know that what we have been doing has some merit; we know that we must focus on doing the right things, and this report will really help us", says Alistair Mills, the ICAT Project Manager. "I am very pleased with the evaluation. Now of course, we have to act on the findings of the assessment. We have a lot of work to do in 2012 to ensure that we meet the needs of the users of the software."

The full version of the report will be available soon on the Institute’s website. The following is a summary:

Both the ICAT collaboration and the ICAT software are at a critical point in the ICAT life cycle. To date, the mechanisms in place for the management of ICAT have met the needs of the partners in the collaboration. However, ICAT is facing the problems associated with the successful and increasing uptake of the software. There is general agreement, and specific requests from stakeholders, that the mechanisms expand to support a larger community of users.The driving force for change comes from two European projects which are evaluating ICAT as a metadata cataloguing solution for widespread use.

The principal source of data for this assessment was a set of fifteen interviews. These interviews led to the identification of a set of observations, leading to a set of recommendations for action.
The key factor for the future success of ICAT is that the collaboration provides assurance to the stakeholders that ICAT will continue to meet their needs.

The assessment provides a set of recommendations of which the following are the most important: define the governance of the collaboration; create a well-defined induction process for new collaborators; improve and streamline the requirements ingest process; expand, improve and promote the road-mapping activity; develop policies for deprecation, code contribution and software release; allocate effort to assurance and promotion; create a group of external product testers; engage actively in facility roll-out plans; create a capability to provide mentors for new collaborators.

What makes good code good and peer software reviewing at Dev8D

Bear.jpgOn Wednesday 15th February at Dev8D 2012 I ran a session on peer software reviewing, which I called Enter the bear-pit - peer software reviewing. This included the latest in our occasional discussions on what makes good code good followed by one-on-one sessions where attendees paired up to review each others' code. After sitting in an empty room for ten minutes, looking at the clock and getting more ever more twitchy, ten attendees arrived to participate.

The attendees, who were software developers, with a couple of researcher-developers, put together a list of the qualities of good code, namely:

  • Documented with usage examples so it's clear not just how to run it but how to use it in your own work.
  • Readable using meaningful naming, using conventions and a consistent style.
  • Optimised.
  • Deodorised  to avoid code smells, that is, code that just feels wrong e.g. large classes or methods, methods with many parameters, duplicated statements in both if and else blocks of conditionals etc. I'd never heard this term before – it has it’s origins in Chapter 3 Bad smells in code by Kent Beck and Martin Fowler in Fowler, Martin (1999). Refactoring. Improving the Design of Existing Code. Addison-Wesley. ISBN 0-201-48567-2. For an overview, see code smells at Wikipedia.
  • Concise and modular avoiding large classes or methods or methods with many parameters.
  • Tested and testable via a test suite.
  • Clear and well-designed so that when you look at it you can understand it and believe it does indeed work.
  • Consistent with the conventions and patterns of its language.
  • Reuses and recycles where appropriate and doesn’t reinvent the wheel.
  • Has copyright and a licence.
  • Well-commented with concise, accurate and up-to-date comments that explain why the code as it is and commented in the recommended style for that language e.g. JavaDoc or Doxygen.
  • Has no commented-out code, since it’s unclear whether such code is redundant, deprecated or should be uncommented in future.
  • Peer reviewed.
  • Usable.
  • Works.
  • Doesn’t silently or cryptically fail.

The attendees were then arranged into twos and threes and spent over an hour discussing each others' software in terms of readability, documentation, tests and general project openness, according to the interests of the individual attendees.

There was discussion at the end on the potentially thorny issue of solo developers who may not want others looking at their code, and might be sensitive, defensive, or aggressive at the suggestion that their code should be reviewed (or changed, or even fixed!) This relates to possible differences between perceived ownership (the individual developer) and its actual ownership (which may be a project or organisation). How is an environment fostered in which code is viewed as being under collective ownership with collective responsibility as to its quality, maintenance and improvement? How best can developers, project leaders or PIs encourage collective ownership, peer review and promote these as benefits in terms of encouraging developer growth and learning, improving code quality and maintainability and reducing risks (such as if a developer is hit by a bus!)? A subject for a future blog post!

From the comments at the end of the session it seemed that the attendees found the session a valuable experience and we hope to run this again at future events.

A big thank you from the Institute to the Dev8D organisers for allowing us to run this session!

Usability for sustainability

Frustrations.jpgI spent a couple of days this week at the wrap-up workshop for the JISC Usability Programme. Here in Edinburgh we've just completed a project under this programme to enhance a visual workbench tool for data-intensive research, and the meeting brought us together with a dozen or so similar projects to swap war stories and thoughts.

I have to confess to being fairly new to many of the ideas, even though they've been around for a while (just ask Russell Beale of the University of Birmingham!). However, our project had excellent support from Mike Jackson here at the Institute who has a background in HCI, and a combination of his expertise in heuristic evaluation and the Institute's own software evaluation guide meant that the usability perspective really drove the project development agenda. You can follow the project's progress on the blog over at SourceForge.

Can you code better than your friends? Find out at Dev8D.

ScoringJudges.jpgLast year, I went to Dev8D and was extremely impressed by the enthusiasm shown by the three hundred developers who attended. This year, I'm happy to say that the Institute has been invited to present a workshop about sustainability, but there was a proviso: it has to be exciting. Now there's a challenge.

Developing sustainable software is mainly about good software engineering, and good software engineering is like regular exercise, or eating your five, daily portions of fruit and veg. It pays off in the long run, but it's not always edge-of-the-seat kind of stuff. Fortunately, Mike Jackson came to the rescue with an excellent idea: competitive coding.

The workshop will take place on Wednesday 15 February in room 3E at 10.00-12.00 - see the programme for details.

At the workshop, we're going to pair people up and get them to assess each other's software - with a focus on the things that make software sustainable. We'll start off with some of the simple stuff, like can you find the project's website armed only with the name of the software? And then we will build up to more complex questions about how the software is written: is it readable, installable, designed well? There will also be an open session at the start of the workshop, where we will discuss what makes good code good.

It's easy to listen to us talk about good software engineering, but it's quite a different experience to have a friend look at your code and tell you what they think. And it's not just about learning a few home truths, it's about a fresh insight which can lead to new and fruitful changes and improvements.

How easy is it to teach software skills?

GraduateSmile.jpgRegular readers of this blog will know that the Software Sustainability Institute has been collaborating with the Software Carpentry initiative to develop and deliver courses. Greg Wilson from Software Carpentry has set up a Peer2Peer University course on "How to Teach Webcraft and Programming to Free Range Students". One of the things that the SSI has become aware of as it has undertaken projects is that the experiences and skills in programming of researchers varies greatly, even within an research domain or group.

As part of the first exercise, members of the course have been considering the recommendations published in 2007 by the US Department of Education’s Institute of Education Sciences in a 60-page report: Organizing Instruction and Study to Improve Student Learning. The seven recommendations are summarised below, but the full report is worth a read as it contains a great deal of evidence to back up the validity of the recommendations and other claims.

  1. Space learning over time. Arrange to review key elements of course content after a delay of several weeks to several months after initial presentation.
  2. Interleave worked example solutions with problem-solving exercises. Have students alternate between reading already worked solutions and trying to solve problems on their own.
  3. Combine graphics with verbal descriptions. Combine graphical presentations (e.g., graphs, figures) that illustrate key processes and procedures with verbal descriptions.
  4. Connect and integrate abstract and concrete representations of concepts. Connect and integrate abstract representations of a concept with concrete representations of the same concept. 
  5. Use quizzing to promote learning. Use quizzing with active retrieval of information at all phases of the learning process to exploit the ability of retrieval directly to facilitate long-lasting memory traces. 
  6. Help students allocate study time efficiently. Assist students in identifying what material they know well, and what needs further study, by teaching children how to judge what they have learned.
  7. Ask deep explanatory questions. Use instructional prompts that encourage students to pose and answer “deep-level” questions on course material. These questions enable students to respond with explanations and supports deep understanding of taught material.

Many of these recommendations are aimed at more traditional notions of students - the SSI is training those who hae already undertaken university degrees: typically PhD students and early-career researchers, though also all the way up to estbalished professsors.

Scala: fewer lines of code and better pay?

Scala.jpgBy Joanna Leng, independent computational scientist.

I was recently thinking about Java accreditation, but a friend suggested that I should learn Scala instead. I did a quick web search and found that Scala was more compact than Java (it required fewer lines of code to produce the same outcome) and that Scala programmers tend to be paid more than Java programmers. So when I saw a talk called Scala kickstart I decided to give it a go.

The talk was given by Jan Machacek from Cake Solutions (the slides are also available).

So what is Scala?

  • It is a fusion language that combines object oriented with functional programming (it is not a purely functional programming language though).
  • It is statically typed.
  • It is easy to adopt, because it works with existing Java byte code and produces Java byte code. Scala is normally used in combination with Java either because you are building upon legacy codes or because you need the Java Swing UI.

It also has some cool features:

  • It can convert text to speech.
  • The acca libraries which handle parallel programming and multithreading well - better than Java.

The talk went through six elements that you need to be proficient in Scala, and it was clearly and well presented. The official Scala web site also looks to be a great resource and, of course, you can download Scala to try it out.

How many funding aggregators do we need?

PiggyBank.jpgBy Simon Hettrick.

Last year, I found myself talking about funding aggregators and whether we could link to them on the Institute’s website. At the time, I thought that it was a little strange: why the plural? Why would anyone need more than one aggregator? At first, there might be an aggregator for UK funding, one for Europe and one for industry, but surely that wouldn’t last for long, because someone would just write an uber-aggregator that combined all three. It turns out that this might have happened.

I’ve just added Research Professional to our list of useful resources. It cites itself as “the world's leading provider of news and funding information for research professionals” and covers “academia to politics, technology to the arts”. The people who run the service are called Research (not a good name if you wish to prevent confusion in the research market, but let’s overlook that problem) and produce a number of publications from Research Europe to Research Caribbean. In fact, they say that provide the “largest database of research funding opportunities available worldwide”.

I took a quick look at the Research Professional website, and it certainly seem comprehensive. After logging in with my university email address, I was presented with a handy interface into which I typed as many random research-related words as I could think of. I was quickly presented with funding opportunities from all round the world. The search can be focussed by selecting specific countries, funding agencies, closing dates and many, many more variables. And a lot of information is presented for each call, such as the closing date, the award type and the frequency with which the call is made.

What makes good code good? A digital social research view

LaptopAngel.jpgBy Mike Jackson.

Last week, we ran a sustainability training workshop for Digital Social Research, where we asked "what makes good code good?". The attendees, who were research programmers and software developers, put together a list of necessary qualities, which we've copied here.

Good code should...

  • Correct. Code must be correct and it should also be possible to demonstrate that it's correct, e.g. through provision of associated tests or mathematical models of requirements.
     
  • Well-designed. Code should be modular with well-defined interfaces, inputs and outputs and with code and data encapsulation. It should be elegant and no more complex than necessary. There should be minimal inter-dependencies, no hidden dependencies and limited platform-specific dependencies. Together, these help ensure that the code is easily understandable by other developers; can promote reuse, so reducing the need to reinvent the wheel in subsequent projects; and ensure software can be configured, adapted and extended easily.
     
  • Readable. Code should be commented and indented and use sensible naming. Comments should describe why the code is as it is, since the code itself describes what it does and how it does it. Care should be taken that comments reflect the current code, because code evolves through time.
     
  • Appropriate. The languages, technologies and tools should be suitable for the intended application area, and also take into account the skills and knowledge of the current and future developers.
     
  • Robust. The code must not break anything and it should fail gracefully. Ideally, it should support configurable logging or other ways to help users and developers identify and diagnose errors. Errors must not be swallowed by the code without a good, and commented, reason.
     
  • Efficient. Code must run in a timely way, for the specific applications area.
     
  • Available. Software should be available to those who need it! If it's not available, how will anyone be able to use it?
     
  • Usable. Software should be usable, buildable, deployable and runnable. Difficult-to-use software can discourage its uptake by users. Software that can't be built, deployed or run is highly unusable!
     
  • Copyrighted and licenced. These protect intellectual property and let others know how they can use, modify and redistribute it.
     
  • Under revision control. The revision control should be backed up, and supported by sensible commit messages.

How BIG is big data?

Preikestolen.jpgBy Chris Morris, STFC.

GenBank now contains 100000000000 base pairs. That's big, in the sense each similarity search visits every record, and there are millions of searches a day. But it's not BIG, in the sense that it fits on one disk, and only takes 200s to transfer at 1Gb/s.

An electron microscope tomography image may contain 8 billion pixels. That's big, in the sense that the noise reduction algorithms take polynomial time in image size. But it fits on a USB stick. The data stream from LHC is big; but most of it is of no significance. But it can be reduced to a one-bit answer, such as "Does the Higgs boson exist?" The Sloan Digital Sky Survey contains several terabytes of data. That's big, since it is all unique data of potential interest. But it would fit on a medium size RAID. A Next Generation Sequencing instrument can record 100TB of data in a day. It will quickly be reduced ten-fold, and then later reduced to a consensus sequence - a whole human genome is 1.5GB.

Whether data is big or not depends on what you want to do with it.

I am not a light bulb!

Bulbs.jpgBy Simon Hettrick.

Most people turn apoplectic when faced with someone who “thinks outside the box” or attempts to harvest “low hanging fruit”. And rightfully so. We’ve learned to vilify management speak, because it’s wasteful and verbose, but what about its visual equivalent? It’s time that we start saying “NO!” to meaningless images.

The world of software is a grim place if you need an image for a website. This is down to a fundamental problem: you can’t see software. This leads a lot of people to think “you can see computers!”. But there’s only so many times that you can use that data-centre image - with its banks of cold, emotionless circuitry – before things start to get depressing. And it is this tortuous path that causes some people to embrace stock photography with an incredible level of enthusiasm.

There’s nothing wrong with using stock images. It's difficult not to, unless you have your own photography department. I just advise some caution on the images you choose. Take the image on this page, with it’s clever subtext of being the illuminated one amongst dowdy colleagues. It’s is a good image, but it’s also completely generic. Anyone could find a concept in their business that this image could represent, so it will end up being used by everyone from management consultancies to electricians, and everyone – absolutely everyone – in between. This genericide is infectious: if you use a generic image, you will add nothing but blandness to your publicity.