Thoughts on the RSE Conference, Birmingham 2019
By Magnus Hagdorn
This blog post was first published on Marsupium Random Musings.
The conference started with a keynote by Andy Stanford-Clark, IBM UK Chief Technology Officer, on IoT, AI and quantum computing. The talk was very much fun. Although I must say I am still not convinced by IoT and I am somewhat worried by AI. The issue of needing to train the neural networks properly to reduce bias was brought up but all in all the talk was unsurprisingly optimistic. The third part on quantum computing was very interesting. IBM does provide access to their quantum computers online.
After the coffee break I joined the session on reproducible software. Anna Krystalli from the University of Sheffield gave an excellent introduction to rrtools which is an R package that helps scientists package up a paper and all associated data and R scripts as a package so that the paper can be regenerated. R seems to come with a lot of very handy tools that make this sort of workflow easy by providing templates and autogenerating a lot of the required infrastructure. R Markdown and bookdown featured to manage generating high-quality print outputs. The make for R system, drake, was also mentioned. It would be very nice to get similar tools for python.
The other talk in that session on BioSimSpace, given by Lester Hedges from the University of Bristol, was also very interesting. BioSimSpace provides an abstract interface to various computational chemistry packages which is quite neat. The thing that really impressed me was that it also provides a jupyter notebook GUI elements that handles inputs for a script with drop-downs and file uploads, etc. The really nifty thing is that these notebooks can be downloaded as python scripts. These inputs then are handled by the argparse module and can be run from the command line. I think this is a really cool approach to bridging the notebook/command line gap and could be used for all sorts of applications.
For the third talk I nipped to the useful tools and libraries session to see Declan Valters' (now BGS) talk on GeoPandas. Declan presented a nice python notebook demonstrating the features of GeoPandas.
The remainder of the afternoon was about the RSE society, lightning talks introducing the posters and a panel session on sharing RSE work across boundaries. It is clear that the RSE movement is very collaborative and a large aspect is about training.
Most of the morning session I spent in Citation and Software Discovery session. Olexandr Konovalov from St Andrews developed templates (code4ref.github.io) to help register software in PURE. PURE can import data from ORCID. The other talk I saw in this session, by Stephan Druskat, DLR, was very theoretical and involved constructing graphs of the relationships between authors, software revisions, dependencies and institutions. Quite complicated and I am not entirely sure how useful that is, apart from that dependencies should be cited properly. In between the two talks I went to see a talk on the limitations of machine learning by Camilla Longden from Microsoft Research. This was a recurring theme. Bias in the training sets was discussed as was the difficulty of interpreting the results. The tank story cropped up in a number of presentations in two versions: One version has it that the American military was training an AI to distinguish between American and Russian tanks. It turned out that the AI identified more (Russian) or less (American) grainy pictures. The other variation also involved the American military. This time they wanted to find camouflaged tanks in a wood. They used a training set with and without tanks in a wood. The system worked well for the training data. When tried with another data set it failed. The AI had successfully figured out that it was nice and sunny when the tanks were present but overcast when they were absent. AI is a bit of a buzz technology - in many instances linear regression or decision trees are sufficient and easier to understand.
Next followed the keynote given by Ben Goldacre. The presentation was excellent, enthusiastic, entertaining and shocking. Mostly on pharmaceutical tests and the fact that more often than not only the successful trials are reported. He also discussed sampling error and abuses of visualisation.
After lunch I attended the demonstration of NBfancy by Jack Betteridge and James Grant, both from the University of Bath. NBfancy can be used to annotate Jupyter Notebooks to produce teaching materials similar to the software carpentry style. Another tool to automatically mark submissions - Submitty - was demonstrated by Anastasis Georgoulas and David Perez-Suarez, both from UCL. Submitty is a rather nifty web application that allows students to upload programming tasks. These get automatically tested using predefined tests. The system allows for anonymous submissions, multiple markers, extra manual marks and penalising late submissions. One drawback with automatic marking is that it requires slightly different assignments that can be marked automatically.
After the break I attended a demonstration of autograd and automatic differentiation tools used by PyTorch. Douglas Finch from the School of GeoSciences, Edinburgh presented his work on scraping DEFRA air quality data and displaying it as interactive graphs using django and plotly. Finally, Mike Simpson from the University of Newcastle presented his work on visualising uncertainty. They use blender and its python API to automatically generate high quality 3D visualisations using a 3D model of Newcastle and sensor data. Data is presented as glyphs (green, amber, red) and uncertainty as a sinusoidal border of the glyph - higher frequency sinusoid indicates more uncertainty. I was slightly irritated - the glyphs looked a bit like flowers. The blender visualisation (on youtube) was very nice though.
Maggie Aderin-Pocock gave the after-dinner presentation on having crazy dreams and being a space scientist. The presentation was very entertaining. I was particularly amused by the Clangers being the gateway drug to harder stuff - Star Trek.
The last day of the conference was dedicated to workshops. I attended the Binder workshop in the morning. During the workshop we created a binder cluster on the Microsoft Azure cloud using kubernetes. I was interested both in how kubernetes and Azure works and what binder looks like. Binder is a way of packaging Jupyter Notebooks in a docker container and running it in the cloud. The notebook, its dependencies and any datafiles are described in a file that is stored in github. Binder will build the image and deploy it on the cluster. The user gets a URL that can be shared. When someone connects to the URL a new container instance is started so that every user gets their own. I presume there is a way of limiting resource usage. Binder looks quite useful for people who want to share live notebooks with others. It might be possible to extend the EDINA Noteable Jupyterhub service to include a binder service. There is also a public mybinder service that you can use for small notebooks.
In the afternoon I attended the modern C++ workshop introducing the latest features of the standard C++ library. There is some really cool stuff that is worth looking into once the features become supported by the compilers (it'll take years for g++ in scientific linux to catch up). For anyone interested in C++ the website cppreference .com was highly recommended.