Developing a database to handle dark matter experiment data

Posted by s.aragon on 2 February 2018 - 8:00am

By Gillian Law, technology writer

The LUX-ZEPLIN project is building the largest and most sensitive dark matter detector of its type ever constructed. The detector will be built a mile underground in the Sanford Underground Research Facility (SURF) in Lead, South Dakota and is due to go live in 2020.

Potential detector materials are currently being screened prior to their use in the experiment, and the results are collated and analysed using a 43-sheet Microsoft Excel spreadsheet. The spreadsheet has worked well to date, allowing researchers to share and view data, but moving to a more versatile and robust database solution will be very useful once the experiment begins, says Dr Alex Lindote, LZ Background Simulations project lead, who is based at Laboratory of Instrumentation and Experimental Particle Physics (LIP)-Coimbra, Portugal.

Lindote set up the spreadsheet in late 2015, bringing in data from a Google spreadsheet that had been set up by researchers to share their data.

“It was getting hard to track who was making changes and what was happening, so I was asked to start taking care of it. I decided to move it to an Excel file that I could control more easily,” Lindote says.

Once it became clear that continuing to use the spreadsheet once the experiment is live would be cumbersome, Lindote and Dr Jim Dobson, the LZ Simulations Working Group Convenor based at University College London, responded to an open call from the Software Sustainability Institute to see if someone could help.

“The spreadsheet has become quite complex, and it had been in the back of my mind for a while that we needed to move to something more maintainable, where you could control back ups and differences between versions, and have a record of who is contributing. The open call seemed like a really good opportunity to get some expert advice,” Dobson says.

The Institute’s Mike Jackson stepped in, and looked at how to create a database to manage the project’s data.

“Mike just dove straight in and essentially reverse engineered the whole spreadsheet,” says Dobson.

“It’s one thing for us to use the spreadsheet when we know a lot of the background and the physics behind what we’re doing in there, but for someone external to list and figure out all the many, many interdependencies between all the inputs and outputs and calculations … I was really impressed,” he says.

Jackson has helped the team to create a test database, a full version of which will go live in March 2017.

“And one of the great things is that he has written a comprehensive set of prototype code, with concrete examples of the solutions that we might work towards – so we have a skeleton set of functionality that we can now build on,” Dobson says.

A working group is currently extending the test database and implementing Jackson’s recommendations.

Jackson has also given advice on how to handle the switchover from the spreadsheet to the database, Dobson says.

“The tricky thing is that this is a really core element of the experiment, to collate and analyse the screening results from all these materials, and it’s being used almost constantly. So we need to be sure that we’re ready to switch, before we do. Mike has written a document laying out how you do that switch in a reliable way, where you have both the new solution and the old one running in parallel for a while and show that they give the same results.

“Its been very interesting to see how you do that sort of transition, from a software consultant’s point of view,” he says.

The new database, once in place, will be able to keep up with the vast amounts of data that will be generated and the processing that needs to be done, Dobson says.

“It’s going to be very competitive once we’re searching for dark matter, and the new database will take us to a completely different level in terms of the complexity of operations that we can do. We can automate a lot of the interfaces to other elements of the experiment that we currently manage manually, and we will be able to deploy changes on a very short timescale,“ he says.

Dobson and Lindote also hope that they can use the lessons they have learned from the consultation in future.

“Just the general coding style is very useful to see, in terms of how often to commit changes and the level of detail in commit messages, documenting code and even the capturing of requirements beforehand – all that stuff is just very useful,” Dobson says.

“Mike was very good at explaining concepts, too – whenever there was something we didn’t understand, he would come back and find new ways to make things clear. That iterative process was very useful in terms of homing in on the solution that we ended up with. His willingness to engage and be patient with us and iterate was one of the reasons it was so useful to us,” he says.

Longer term, the team hopes to release a subset of the code as a tool for the physics community.

“There are a few areas of physics that need low background radio assay tracking so you could see a tool like this, that has been properly thought through and designed with professional input, being very valuable,” Dobson says.

“But that’s probably quite far off at the moment – we have to finish implementing it for ourselves, first!”