By Colin Sauze, RSE at Aberystwyth University
At Carpentry Connect Manchester 2019 I ran a session on developing a new carpentry style lesson on machine learning.
This is something I’d been wanting to do for a while as I’m seeing demand for machine learning across a range of disciplines. The problem until very recently has been that making effective use of machine learning required the help of a machine learning expert. In the last few years tools like scikit-learn, TensorFlow, Keras and Caffe have made things a lot easier. What I’ve found still to be lacking are good training materials, especially for researchers without a machine learning background.
I’d already done some preliminary work on this after I was asked to run a spring school session for some DProf (research students embedded in industry) students at Aberystwyth University. I used this as a basis for the session and explained what I’d done already, why and what the experiences of teaching it had been like.
It was the first time I’d run a session like this, so I was a little unsure of what to expect or what was expected of me. It had been really useful that François Michonneau and Tracy Teal ran a session on day one on Carpentries Curriculum Development. That really helped to structure my plans.
What we did
At Carpentry Connect Manchester we had a discussion of additional material that might be included. This particularly focused on unsupervised learning methods such as PCA and TSNE, which hadn’t been included in my original lesson. We then discussed who might take such a course and wrote some personas. These included a remote sensing geographer, a psychologist, a physicist and two biologists. Finally we looked at some potential data sets that we could use as an example for this lesson. Ideally the data set should be usable in all the different techniques we’d like to teach,, including regression, neural networks, clustering and dimensionality reductions. A few suggestions were made and the one which showed the most potential was some data on how many cyclists were using some cycle routes in Edinburgh, possibly coupled with weather or climate data.
We also relocated the repository from my personal github to a new machinelearningcarpentry organisation.
All our notes, whiteboard drawings etc are up on ccmcr19 branch of the repository.
It was great to get together with a group of like-minded people who all share the desire to create a carpentry lesson on machine learning. I’m really looking forward to when we have a lesson good enough that people around the world are teaching it independently.
From the group at Carpentry Connect I think there are few potential contributors/maintainers, but additional volunteers are welcome.
Since then I’ve put out an email on the Carpentries Discuss mailing list and created a #machine_learning channel on the Carpentries Slack. If anybody reading this post would like to contribute then please head over to the channel.
One thing that’s generated quite a lot of discussion on Slack since then is the inclusion of an ethics section. I don’t think I’ve ever seen another Carpentries lesson do this, but I felt that it was an important thing for a machine learning course to include. In the last two years we’ve seen problems with biased data sets leading to biased machine learning systems, facial recognition technology being used in public by the police and even an autonomous vehicle accidentally hitting and killing a pedestrian. Given the range of things that people taking this course might end up doing I think it's definitely worth giving ethics a mention. Some of the suggestions from the discussion have included making sure that ethics isn’t left until the end to guarantee it gets mentioned and isn’t rushed, and to try to include an example dataset which is biased in some way so that we can create some exercises that involve trying to identify and compensate for this.
Want to discuss this post with us? Send us an email or contact us on Twitter @SoftwareSaved.