Software and research: the Institute's Blog

Latest version published on 6 November, 2017.

GRADnetBy Mike Jackson, Software Architect

On 18th October I attended GRADnet's "Moving Forward for 2nd Year PGRs" day in London for physics post-graduates, and ran two sessions on "Writing better software to research".

SEPnet, the South East Physics Network, is a consortium of universities in the south east of England, promoting excellence in physics in both academia and industry, via research, collaboration, training, and outreach. GRADnet is SEPnet's collaborative graduate school which provides professional skills training to PhD students.

GRADnet's "Moving Forward for 2nd Year PGRs" day offered attendees a choice of 5 sessions both morning and afternoon, on Creating impact, How to write a successful Fellowship Application, Research data management, Unconscious Bias and Writing better software for research. 66 students attended the event, held at the Park Crescent Conference Centre, London.

My 2.5 hour session on Writing better software for research provided students with a hands-on code review to get them thinking about the qualities of good, and bad, code. I gave an introduction to a selection of best practices from Wilson et al.'s highly recommended 2014 paper Best Practices for…

Continue Reading

Latest version published on 2 November, 2017.

LUX-Zepelin water tankBy Mike Jackson, Software Architect

In Using Excel for data storage and analysis in LUX-ZEPLIN, I summarised how Excel is both used and managed within the LUX-ZEPLIN (LZ) project and recommendations for improvements. In this second of two blog posts, I describe how LZ could migrate their data within Excel to MongoDB with supporting software, in Python, for computation and presentation. I also describe a proof-of-concept which extracts data from Excel, populates MongoDB with this data, and computes the radiogenic backgrounds expected from a subset of the possible sources of contamination.

As a reminder, the BG table is an Excel spreadsheet, with 43 sheets, used by LZ to calculate radiogenic backgrounds, and the WS Backgrounds Table is a sheet within the BG table which summarises the radiogenic backgrounds expected during the lifetime of the experiment from each source of contamination.

Migrating from Excel to MongoDB and Python

Excel combines data, computation and presentation. For example, a cell with a formula in Excel is a combination of data and computation, in effect a tiny program. The migration plan was based around migrating from the BG table into a solution…

Continue Reading

Latest version published on 1 November, 2017.

LUX-Zepelin water tankBy Mike Jackson, Software Architect

The LUX-ZEPLIN (LZ) project are building one of the largest and most sensitive dark matter detectors ever constructed. I’ve been providing consultancy, as part of an Institute open call project, on how LZ can migrate their data storage and analysis software from Microsoft Excel to a database management system-centred solution. In the first of two blog posts, I summarise how Excel is both used and managed within LZ and recommendations for improvements.

As described in my blog post at the outset of the consultancy, Shining a light on dark matter, LZ partners at University College London and University of Coimbra, maintain LZ's backgrounds control software. At the heart of the backgrounds control software is a Microsoft Excel spreadsheet (termed the “BG table”). While fit for purpose in the experiment’s early design and procurement stage, Excel is now reaching its limits in terms of sustainability, its ability to interface with other software in the experiment (for example, analysis software that interprets dark matter data), and the interface with…

Continue Reading

Latest version published on 16 October, 2017.

GUADECBy Raniere Silva, Software Sustainability Institute, David Pérez-Suárez, University College London.

Last year Raniere found out that the GNOME User and Developer European Conference (GUADEC) 2017 would be hosted in Manchester and that he should attend. Early this year, during Science Together, Raniere mentioned GUADEC to David Pérez-Suárez and we agreed to show up at the conference to find out what we could learn from GNOME about onboarding newcomers and best software development practices.

Onboarding Newcomers

GUADEC

All open source projects struggle with onboarding newcomers. And, most of the time, driving yourself to the first contribution is a journey that will have old source code, out-of-date documentation, undocumented culture and other rocks on the way. Thankfully, many contributors to open source are working collaboratively with other…

Continue Reading

Latest version published on 17 October, 2017.

digital humanititesBy Giacomo Peru

On 26th and 27th September, Oxford held one of the first Data Carpentry workshops for Humanities*. The workshop is fruit of a collaboration between Reproducible Research Oxford and the Software Sustainability Institute. Iain Emsley has undertaken the endeavour of porting the Ecology lessons to a Humanities version, using Early English Books Online Text Creation Partnership texts as the dataset. The choice has been to port Python but R will come next. The team of instructors was Iain (Python), Pip Willcox, from the Bodleian Libraries’ Centre for Digital Scholarship (Spreadsheets) and Lucia Michielin, from the University of Edinburgh (Open Refine and SQL).

According to the instructors, the dataset needs more cleaning (for example, multiple authors come in the same column!). The lessons need further revision but there is hope to submit them to Data Carpentry for consideration by the end of the year.

Contributions are therefore welcome!

Dataset

Spreadsheets

Open Refine

Continue Reading