• Scaffolding
    Image by Mark McNestry

    By Mike Jackson, Research Software Engineer.

    When developing research software, we need to know what we are going to write, who it is for (even if this is just us), how we will get it to them, how it will help them, and how we will evaluate whether it has helped them. A Software Management Plan (SMP) can help us think about these and decide upon the processes we will use when developing our software. To help write SMPs, we have now published version 1.0 of our Checklist for a Software Management Plan.

    Why write a Software Management Plan?

    Research software can take many guises. It can be a 50 line bash shell script for manipulating and filtering files, a collection of 100 line R scripts for running a bioinformatics analysis, 10,000 lines of Java for medical image analysis or 100,000 lines of Fortran for computational fluid dynamics. It may be written in scripting languages such as Unix shell, Python, R or MATLAB or in "traditional" programming languages such as C, C++, Fortran or Java. But, whatever guise it takes, research software is an integral part of the modern research ecosystem.

    When developing research software, it is easy to focus only on goals and activities such as collaborating with other researchers, writing papers, attending conferences…

    Continue Reading
  • Cells being analysed within PickCells
    Image courtesy of Sally Lowell

    By Mike Jackson, Software Architect, The Software Sustainability Institute.

    PickCells is an image analysis platform developed by the Centre for Regenerative Medicine (CRM) at The University of Edinburgh. PickCells combines generic image analysis algorithms, visualisation modules and data mining functionality within a stand-alone Java application. PickCells provides a graphical environment within which biologists can study multidimensional biological images and explore 3D spatial relationships between objects within complex biological systems such as stem cell niches, organoids, and embryos. Since January, EPCC has been working with CRM on the development of PickCells and its supporting resources.

    My EPCC colleagues Elena Breitmoser and Arno Proeme and myself worked with CRM's Sally Lowell and Guillaume Blin to take PickCells into a state suitable for more widespread promotion with the intent of encouraging deeper community engagement, by both users and developers. Our work was focused on building and populating a web site for users, developers and contributors and providing consultancy on developing and supporting open source software.

    PickCells is highly-modular and the documentation for each component is…

    Continue Reading
  • The Software Sustainability Institute website will be down for scheduled maintenance on Thursday 26th April from 7:30am-8:00am. We apologise for any inconvenience this may cause.

  • LUX_watertank-1024x587.jpgBy Gillian Law, technology writer

    The LUX-ZEPLIN project is building the largest and most sensitive dark matter detector of its type ever constructed. The detector will be built a mile underground in the Sanford Underground Research Facility (SURF) in Lead, South Dakota and is due to go live in 2020.

    Potential detector materials are currently being screened prior to their use in the experiment, and the results are collated and analysed using a 43-sheet Microsoft Excel spreadsheet. The spreadsheet has worked well to date, allowing researchers to share and view data, but moving to a more versatile and robust database solution will be very useful once the experiment begins, says Dr Alex Lindote, LZ Background Simulations project lead, who is based at Laboratory of Instrumentation and Experimental Particle Physics (LIP)-Coimbra, Portugal.

    Lindote set up the spreadsheet in late 2015, bringing in data from a Google spreadsheet that had been set up by researchers to share their data.

    “It was getting hard to track who was making changes and what was happening, so I was asked to start taking care of it. I decided to move it to an Excel file that I could control more easily,” Lindote says.

    Continue Reading
  • cardforfellows(hols2018).jpgEveryone at the Software Sustainability Institute wishes our friends and colleagues all the best for the holiday season.

    In a nutshell, this year has seen the announcement of a wonderful new set of Fellows, a second edition of the RSE conference, even more events, Software and Data Carpentry workshops, and Open Call projects.

    After a busy year, we need a little break to get ready for everything we plan to do in  2018, like our Collaborations Workshop 2018. So please excuse us while we switch off our email and social media from the 23rd December to the 3rd January.  

    Enjoy the festivities!

  • Python web frameworksBy Mike Jackson, Software Architect.

    As part of my open call consultancy for LUX-ZEPLIN (LZ), I was asked to review web frameworks for Python, in particular those that could be used with MongoDB, the database management system used by LZ. In this blog post, I survey four frameworks for implementing web applications: Django, TurboGears, Flask and Pyramid.

    These four web frameworks were selected from the many available because they each meet LZ’s requirements that they can be deployed under the popular Apache web server, that they support authentication and authorisation, and that they support directly, or via third-party libraries, the use of MongoDB for holding application-specific data. Additionally, the four web frameworks are popular, with a large user communities and each have permissive open source licences. This latter selection criteria follows our guide on Choosing the right open source software for your project which summarises factors to be considered when choosing open source software for use on projects.

    The key details of each web framework are as follows…

    Continue Reading
  • Python codeBy Mike Jackson, Software Architect

    As part of my open call consultancy for LUX-ZEPLIN (LZ), I was asked about the feasibility of developing a web service that accepted Python code from users and executed their code server-side within a Linux environment. In this blog post I give a brief overview of a number of approaches that could be taken to implement such a service, focusing on those that protect the web service, and its underlying server, from code that is, whether by accident or design, malicious.

    First things first, developing a web service that accepts Python code from users and runs this server-side is, in itself, it is not technically challenging. Any developer could knock up a proof-of-concept quite rapidly. The challenges are how to ensure that the web service is able to successfully run a user’s code, and how to protect the web service from the user’s code.

    The first challenge, how to ensure that the server is able to successfully run a user’s code, can be restated as how to ensure that users only submit code that can successfully run on the server. At its simplest, this can be handled by publishing information about the environment within which the server will run the user’s code (e.g. operating system version,…

    Continue Reading
  • LUX-Zepelin water tankBy Mike Jackson, Software Architect

    In Using Excel for data storage and analysis in LUX-ZEPLIN, I summarised how Excel is both used and managed within the LUX-ZEPLIN (LZ) project and recommendations for improvements. In this second of two blog posts, I describe how LZ could migrate their data within Excel to MongoDB with supporting software, in Python, for computation and presentation. I also describe a proof-of-concept which extracts data from Excel, populates MongoDB with this data, and computes the radiogenic backgrounds expected from a subset of the possible sources of contamination.

    As a reminder, the BG table is an Excel spreadsheet, with 43 sheets, used by LZ to calculate radiogenic backgrounds, and the WS Backgrounds Table is a sheet within the BG table which summarises the radiogenic backgrounds expected during the lifetime of the experiment from each source of contamination.

    Migrating from Excel to MongoDB and Python

    Excel combines data, computation and presentation. For example, a cell with a formula in Excel is a combination of data and computation, in effect a tiny program. The migration plan was based around migrating from the BG…

    Continue Reading
  • LUX-Zepelin water tankBy Mike Jackson, Software Architect

    The LUX-ZEPLIN (LZ) project are building one of the largest and most sensitive dark matter detectors ever constructed. I’ve been providing consultancy, as part of an Institute open call project, on how LZ can migrate their data storage and analysis software from Microsoft Excel to a database management system-centred solution. In the first of two blog posts, I summarise how Excel is both used and managed within LZ and recommendations for improvements.

    As described in my blog post at the outset of the consultancy, Shining a light on dark matter, LZ partners at University College London and University of Coimbra, maintain LZ's backgrounds control software. At the heart of the backgrounds control software is a Microsoft Excel spreadsheet (termed the “BG table”). While fit for purpose in the experiment’s early design and procurement stage, Excel is now reaching its limits in terms of sustainability, its ability to interface with other software in the experiment (for example, analysis software that interprets dark matter data),…

    Continue Reading
  • Porting formulaeBy Mike Jackson, Software Architect

    As part of my open call consultancy for LUX-ZEPLIN (LZ), I looked at how LZ could migrate their data and computation from Excel to MongoDB and Python. There are many resources with valuable advice on cleaning data in Excel into a form suitable for analysis using Python, R or other data analysis packages. Unfortunately, how to handle formulae and cross-references is little discussed.

    Based on my experiences, I have written a guide on “Tips for porting formulae from Excel into code” in which I provide some (hopefully) helpful hints on how to identify and highlight formulae and cross-references, which can help when porting these to Python or R, and to restructure tables so that raw data is contiguous, and so is easy read by data analysis packages or to export into a database or files. Feedback, suggestions and additional advice is more than welcome.

    Feel free to add these as comments!