Research Software Group

The Software Sustainability Institute website will be down for scheduled maintenance on Thursday 26th April from 7:30am-8:00am. We apologise for any inconvenience this may cause.

LUX_watertank-1024x587.jpgBy Gillian Law, technology writer

The LUX-ZEPLIN project is building the largest and most sensitive dark matter detector of its type ever constructed. The detector will be built a mile underground in the Sanford Underground Research Facility (SURF) in Lead, South Dakota and is due to go live in 2020.

Potential detector materials are currently being screened prior to their use in the experiment, and the results are collated and analysed using a 43-sheet Microsoft Excel spreadsheet. The spreadsheet has worked well to date, allowing researchers to share and view data, but moving to a more versatile and robust database solution will be very useful once the experiment begins, says Dr Alex Lindote, LZ Background Simulations project lead, who is based at Laboratory of Instrumentation and Experimental Particle Physics (LIP)-Coimbra, Portugal.

Lindote set up the spreadsheet in late 2015, bringing in data from a Google spreadsheet that had been set up by researchers to share their data.

“It was getting hard to track who was making changes and what was happening, so I was asked to start taking care of it. I decided to move it to an Excel file that I could control more easily,” Lindote says.

Once it became clear…

Continue Reading

cardforfellows(hols2018).jpgEveryone at the Software Sustainability Institute wishes our friends and colleagues all the best for the holiday season.

In a nutshell, this year has seen the announcement of a wonderful new set of Fellows, a second edition of the RSE conference, even more events, Software and Data Carpentry workshops, and Open Call projects.

After a busy year, we need a little break to get ready for everything we plan to do in  2018, like our Collaborations Workshop 2018. So please excuse us while we switch off our email and social media from the 23rd December to the 3rd January.  

Enjoy the festivities!

Python web frameworksBy Mike Jackson, Software Architect.

As part of my open call consultancy for LUX-ZEPLIN (LZ), I was asked to review web frameworks for Python, in particular those that could be used with MongoDB, the database management system used by LZ. In this blog post, I survey four frameworks for implementing web applications: Django, TurboGears, Flask and Pyramid.

These four web frameworks were selected from the many available because they each meet LZ’s requirements that they can be deployed under the popular Apache web server, that they support authentication and authorisation, and that they support directly, or via third-party libraries, the use of MongoDB for holding application-specific data. Additionally, the four web frameworks are popular, with a large user communities and each have permissive open source licences. This latter selection criteria follows our guide on Choosing the right open source software for your project which summarises factors to be considered when choosing open source software for use on projects.

The key details of each web framework are as follows:…

Continue Reading

Python codeBy Mike Jackson, Software Architect

As part of my open call consultancy for LUX-ZEPLIN (LZ), I was asked about the feasibility of developing a web service that accepted Python code from users and executed their code server-side within a Linux environment. In this blog post I give a brief overview of a number of approaches that could be taken to implement such a service, focusing on those that protect the web service, and its underlying server, from code that is, whether by accident or design, malicious.

First things first, developing a web service that accepts Python code from users and runs this server-side is, in itself, it is not technically challenging. Any developer could knock up a proof-of-concept quite rapidly. The challenges are how to ensure that the web service is able to successfully run a user’s code, and how to protect the web service from the user’s code.

The first challenge, how to ensure that the server is able to successfully run a user’s code, can be restated as how to ensure that users only submit code that can successfully run on the server. At its simplest, this can be handled by publishing information about the environment within which the server will run the user’s code (e.g. operating system version, Python interpreter and…

Continue Reading

LUX-Zepelin water tankBy Mike Jackson, Software Architect

In Using Excel for data storage and analysis in LUX-ZEPLIN, I summarised how Excel is both used and managed within the LUX-ZEPLIN (LZ) project and recommendations for improvements. In this second of two blog posts, I describe how LZ could migrate their data within Excel to MongoDB with supporting software, in Python, for computation and presentation. I also describe a proof-of-concept which extracts data from Excel, populates MongoDB with this data, and computes the radiogenic backgrounds expected from a subset of the possible sources of contamination.

As a reminder, the BG table is an Excel spreadsheet, with 43 sheets, used by LZ to calculate radiogenic backgrounds, and the WS Backgrounds Table is a sheet within the BG table which summarises the radiogenic backgrounds expected during the lifetime of the experiment from each source of contamination.

Migrating from Excel to MongoDB and Python

Excel combines data, computation and presentation. For example, a cell with a formula in Excel is a combination of data and computation, in effect a tiny program. The migration plan was based around migrating from the BG table into a solution…

Continue Reading

LUX-Zepelin water tankBy Mike Jackson, Software Architect

The LUX-ZEPLIN (LZ) project are building one of the largest and most sensitive dark matter detectors ever constructed. I’ve been providing consultancy, as part of an Institute open call project, on how LZ can migrate their data storage and analysis software from Microsoft Excel to a database management system-centred solution. In the first of two blog posts, I summarise how Excel is both used and managed within LZ and recommendations for improvements.

As described in my blog post at the outset of the consultancy, Shining a light on dark matter, LZ partners at University College London and University of Coimbra, maintain LZ's backgrounds control software. At the heart of the backgrounds control software is a Microsoft Excel spreadsheet (termed the “BG table”). While fit for purpose in the experiment’s early design and procurement stage, Excel is now reaching its limits in terms of sustainability, its ability to interface with other software in the experiment (for example, analysis software that interprets dark matter data), and the interface with…

Continue Reading

Porting formulaeBy Mike Jackson, Software Architect

As part of my open call consultancy for LUX-ZEPLIN (LZ), I looked at how LZ could migrate their data and computation from Excel to MongoDB and Python. There are many resources with valuable advice on cleaning data in Excel into a form suitable for analysis using Python, R or other data analysis packages. Unfortunately, how to handle formulae and cross-references is little discussed.

Based on my experiences, I have written a guide on “Tips for porting formulae from Excel into code” in which I provide some (hopefully) helpful hints on how to identify and highlight formulae and cross-references, which can help when porting these to Python or R, and to restructure tables so that raw data is contiguous, and so is easy read by data analysis packages or to export into a database or files. Feedback, suggestions and additional advice is more than welcome.

Feel free to add these as comments!


Research Software Engineer

James is a Research Software Engineer at the Software Sustainability Institute.  He received an MChem in Chemistry for Drug Discovery from the University of Bath, before joining the Institute for Complex Systems Simulation PhD programme in 2013. He joined the SSI in September of 2017.

During his PhD studies James worked on a number of software projects, including the Monte-Carlo simulation package ProtoMS.  ProtoMS was successful in an SSI Open Call and received assistance in developing a test suite to ensure correctness of the core Fortran component.  His role in this project was in further development of the test suite, both code and infrastructure, and in ensuring the reproducibility of simulation across a range of platforms and compilers.

MONC By Selina Aragon, Communications Officer, in conversation with Adrian Hill, Met Office

This article is part of our series: Breaking Software Barriers, in which we investigate how our Research Software Group has helped projects improve their research software. If you would like help with your software, get in touch.

Adrian Hill, the project’s primary contact, talked to us about the usefulness of the Institute’s collaboration with the Met Office and EPCC to promote the uptake and development of MONC. Adrian especially highlighted the invaluable help he received from Mike Jackson, Research Software Engineer, in setting up the basis for what has progressed into successful software with unexpected benefits and long-term value, used by researchers as well as PhD and masters' students.

Collaborative efforts

In collaboration with EPCC (Edinburgh Parallel Computing Centre) and the Met Office, the Institute provided help to rewrite the Large Eddy simulation model (LEM) as its successor, the Met Office NERC cloud (MONC). MONC is a complete re-engineering of LEM, which preserves LEM's underlying science. MONC has been developed to provide a flexible community model that can exploit modern supercomputers…

Continue Reading
Subscribe to Research Software Group