Research Software Group

LUX-Zepelin water tankBy Mike Jackson, Software Architect

In Using Excel for data storage and analysis in LUX-ZEPLIN, I summarised how Excel is both used and managed within the LUX-ZEPLIN (LZ) project and recommendations for improvements. In this second of two blog posts, I describe how LZ could migrate their data within Excel to MongoDB with supporting software, in Python, for computation and presentation. I also describe a proof-of-concept which extracts data from Excel, populates MongoDB with this data, and computes the radiogenic backgrounds expected from a subset of the possible sources of contamination.

As a reminder, the BG table is an Excel spreadsheet, with 43 sheets, used by LZ to calculate radiogenic backgrounds, and the WS Backgrounds Table is a sheet within the BG table which summarises the radiogenic backgrounds expected during the lifetime of the experiment from each source of contamination.

Migrating from Excel to MongoDB and Python

Excel combines data, computation and presentation. For example, a cell with a formula in Excel is a combination of data and computation, in effect a tiny program. The migration plan was based around migrating from the BG table into a solution…

Continue Reading

LUX-Zepelin water tankBy Mike Jackson, Software Architect

The LUX-ZEPLIN (LZ) project are building one of the largest and most sensitive dark matter detectors ever constructed. I’ve been providing consultancy, as part of an Institute open call project, on how LZ can migrate their data storage and analysis software from Microsoft Excel to a database management system-centred solution. In the first of two blog posts, I summarise how Excel is both used and managed within LZ and recommendations for improvements.

As described in my blog post at the outset of the consultancy, Shining a light on dark matter, LZ partners at University College London and University of Coimbra, maintain LZ's backgrounds control software. At the heart of the backgrounds control software is a Microsoft Excel spreadsheet (termed the “BG table”). While fit for purpose in the experiment’s early design and procurement stage, Excel is now reaching its limits in terms of sustainability, its ability to interface with other software in the experiment (for example, analysis software that interprets dark matter data), and the interface with…

Continue Reading

Porting formulaeBy Mike Jackson, Software Architect

As part of my open call consultancy for LUX-ZEPLIN (LZ), I looked at how LZ could migrate their data and computation from Excel to MongoDB and Python. There are many resources with valuable advice on cleaning data in Excel into a form suitable for analysis using Python, R or other data analysis packages. Unfortunately, how to handle formulae and cross-references is little discussed.

Based on my experiences, I have written a guide on “Tips for porting formulae from Excel into code” in which I provide some (hopefully) helpful hints on how to identify and highlight formulae and cross-references, which can help when porting these to Python or R, and to restructure tables so that raw data is contiguous, and so is easy read by data analysis packages or to export into a database or files. Feedback, suggestions and additional advice is more than welcome.

Feel free to add these as comments!


Research Software Engineer

James is a Research Software Engineer at the Software Sustainability Institute.  He received an MChem in Chemistry for Drug Discovery from the University of Bath, before joining the Institute for Complex Systems Simulation PhD programme in 2013. He joined the SSI in September of 2017.

During his PhD studies James worked on a number of software projects, including the Monte-Carlo simulation package ProtoMS.  ProtoMS was successful in an SSI Open Call and received assistance in developing a test suite to ensure correctness of the core Fortran component.  His role in this project was in further development of the test suite, both code and infrastructure, and in ensuring the reproducibility of simulation across a range of platforms and compilers.

MONC By Selina Aragon, Communications Officer, in conversation with Adrian Hill, Met Office

This article is part of our series: Breaking Software Barriers, in which we investigate how our Research Software Group has helped projects improve their research software. If you would like help with your software, get in touch.

Adrian Hill, the project’s primary contact, talked to us about the usefulness of the Institute’s collaboration with the Met Office and EPCC to promote the uptake and development of MONC. Adrian especially highlighted the invaluable help he received from Mike Jackson, Research Software Engineer, in setting up the basis for what has progressed into successful software with unexpected benefits and long-term value, used by researchers as well as PhD and masters' students.

Collaborative efforts

In collaboration with EPCC (Edinburgh Parallel Computing Centre) and the Met Office, the Institute provided help to rewrite the Large Eddy simulation model (LEM) as its successor, the Met Office NERC cloud (MONC). MONC is a complete re-engineering of LEM, which preserves LEM's underlying science. MONC has been developed to provide a flexible community model that can exploit modern supercomputers…

Continue Reading

The Research Software Group and the Software Sustainability have organised a Data Carpentry workshop, which will take place on 1st & 2nd August 2017 at the University of Southampton.

The course will cover data organisation in spreadsheets and OpenRefine, SQL for data management, and an introduction to R for data analysis. By the end of the workshop, learners will be able to more effectively analyse and manage their data to aid reproducibilty and  to increase their chances of furthering their research.

For further information and registration, please visit the event page.

Data Carpentry is an international movement to teach researchers better software skills. For more information about Data Carpentry, visit their website.

Constructive Code CritiqueBy Nicolas Gruel, University of Manchester, Andrew Walker, University of Leeds, Vince Knight, Cardiff University, and Mike Jackson, University of Edinburgh.

This post is part of the Collaborations Workshops 2017 speed blogging series.

Code review is accepted as a key process in the creation of maintainable software with a low defect rate. Wilson et al. recommend code reviews as one of their Best Practices for Scientific Computing. The value of code review is beginning to be recognised in the development of research software, for example, in…

Continue Reading

Newcastle University are seeking to recruit a researcher with experience in the development of optimised high performance computing software to join a large multi-disciplinary team of researchers on an ambitious research project. The researcher will be expected to extend and develop a large scale biological simulation model built around LAMMPS. The initial focus for this work will be on parallelising extensions to the LAMMPS codebase which simulate microbial cells using an Individual Based Model. 

This post will be based within the Large Scale Modelling Team of the NUFEB project at Newcastle University. For more information about the project please see the NUFEB website

The closing date for applications is 26 April 2017. Further details can be found at: or contact Steve McGough.

Weather forecastingBy Malcolm Illingworth, Software Consultant, Software Sustainability Institute

The Software Sustainability Institute have been working with the Institute of Climate and Atmospheric Science (ICAS) at the University of Leeds, to help improve the sustainability of their GLOMAP software suite. Kirsty Pringle of ICAS applied for consultancy from the Institute via the Open Call.

One of the biggest challenges in our ability to understand and predict climate change is learning about the role played by tiny particles, such as dust or soot. These aerosol particles are known to influence our climate in complex ways, but how this interaction works is an open area of research.

The Institute of Climate and Atmospheric Science (ICAS) at the University of Leeds  seeks to improve our understanding of how these aerosol particles affect our climate. Their research uses both computer-based climate models and uncertainty analysis to quantify the role that natural aerosols play in climate change. As part of this research, ICAS have developed the GLOMAP model, a flexible…

Continue Reading

Photo of inflatable Santa by Bart FieldsEveryone at the Software Sustainability Institute would like to wish our friends and colleagues all the best for the holiday season.

After a busy year, including the first Conference of Research Software Engineers, the announcement of a wonderful new set of Fellows, and even more eventsSoftware and Data Carpentry workshops, and Open Call projects, we need a little break to get ready for everything we've planned in 2017. So please excuse us while we switch off our email from the 23rd December to the 2nd January, and enjoy the festive season (responsibly)!

Subscribe to Research Software Group