PyData London 2016

By Olivia Guest, Postdoctoral Researcher, Oxford BabyLab, Department of Experimental Psychology, University of Oxford and Software Sustainability Institute Fellow.

I signed up to go to PyData London for three reasons. Firstly, looking over the talks I noticed that a lot of them were about specific machine learning algorithms and libraries we (I and/or my lab use) in our research, e.g., gensim and theano. Specific emphasis was placed on artificial neural networks, a type of computational model I both teach to undergraduate students (part of a movement called connectionism) and use daily in my research. So I assumed that it would be a good opportunity to ask questions and meet the developers of some of the libraries and codebases we use.

Secondly, having been working in experimental psychology departments since 2009, it often requires a little more effort to stay in the loop so to speak when it comes to programming tools and trends. So while I know how to write a journal article and how to design experiments because I practise these in one way or another every day, I very rarely discuss programming with other experienced coders. In my lab, in fact, I am the only person who programs primarily in Python. So I was expecting, given the name of the conference, to meet many more Python programmers and compare notes.

Thirdly, I was curious to see what people in industry are doing with the same kinds of algorithms that I use, e.g., topic modelling, deep learning networks, etc. On this front, I was expecting to see less theory- and more engineering-driven modelling and thus fewer or no toy models, to see more analysis of big (truly huge) data, and so on.

On all three accounts, PyData London exceeded my expectations — it more than delivered! I had an incredibly fun time and I was inspired in various ways to do some (non-”data science”) science of my own (the science I do involves data, but it is most definitely not “data science”). Pretty much everybody that I met was extremely nice, and very interested in what computational cognitive modellers like me do.

In fact, sometimes I felt like I was being interviewed, which while that might sound a little tiring it was definitely more amusing than anything else. Especially so, since unlike me many people there were, or were just about to be, on the job market. And again unlike me, many were looking for industry jobs involving “data science”. Because of this, PyData is a useful conference for a fledgling “data scientist”, helping them to obtain contacts for their next job. Companies such as Channel 4 and Bloomberg’s lyst, for example, were present. The latter demonstrating what fun or frustration one can have with a deep learning network trained on ginormous amounts of data for clustering and categorising items of clothing. A prominent input to their model, at least in terms of blue-ness, was this incredibly blue shoe.

Due to the fact there were three or four concurrent talks/tutorials at any one time, I did not manage to go to every single one that I wanted to, but I was lucky enough to be able to attend most of the artificial neural network-related tutorials and talks. I especially enjoyed and found useful the following: Deep learning tutorial — advanced techniques presented by Geoffrey French and Calvin Giles, which was a great primer in what deep networks can do, and resulted in some great chats in the pub afterwards; the keynote on Laser ranging in a new dimension by Andreas Freise, who has TED-talk level presentation abilities, with content of course of an astronomically-higher calibre; Working with Fashion Models by Eddie Bell and Finding needles in haystacks with Deep Neural Networks by Calvin Giles, both of whom taught their network shoes, including the aforementioned incredibly blue shoe, if I recall correctly; Deep Learning for QSAR by Rich Lewis, who explained how he used deep learning networks to fry an egg on his GPU, as well as to learn the similarity spaces of molecules — impressive, despite being from the other place; Word Embeddings for fun and profit in Gensim by Lev Konstantinovskiy — topic modelling and related methods are used to extract similarity spaces in cognitive science, although I’m pretty certain our pull requests are a nightmare to tidy up (another reason why we need help from the Software Sustainability Institute); and last but not least, Modelling a text corpus using Deep Boltzmann Machines in python by Ricardo Pio Monti — I enjoyed this talk a lot even though it reminded me of programming unrestricted Boltzmann machines (not recommended) in C (not recommended) during my masters (not recommended).

Three talks I especially regret missing, even though it was unavoidable, are: Assessing the quality of a clustering by Christian Hennig, which looks like it would be very useful to me, especially for visualising clusters in the input and output of my models; Cross-modal Representation Learning by Tanmoy Mukherjee, Maryam Abdollahyan — unless I am very much mistaken, their talk appears to be about a more Bayesian (distribution-based) take on inputs to models; and finally, Python and Johnny Cash by James Powell, who in the final closing remarks of the conference demonstrated this Python library to, I suppose, a lot of people’s horror and amusement. All of the talks were filmed, so they are available online.

All-in-all, I had a really great time and made some good friends and contacts. I not only learned a lot but also have been inspired, which is rare. I would recommend PyData to anybody who uses machine learning, especially deep learning methods, and Python because it not only has some useful talks, but also it has an especially friendly atmosphere. If I have the time to go to London PyData 2017, I will definitely be there.

Image courtesy of @ConradHo on Twitter.
Posted by s.sufi on 16 May 2016 - 12:09pm