Mentorship programme: Using Python to analyse keywords in company reports
By Angela Cheng.
This blog post reflects on our Learning to Code mentorship programme as part of a Research Software Camp.
During my time at university, I really enjoyed the introductory data analytics course. My prior experience was mainly with R language and in working with quantitative data. However, I have always wanted to continue to study data science and learn another popular language: Python. The Learning to Code mentorship programme was an excellent opportunity to do this, and I worked on a project under the supervision of my mentor, Raphaela Heil.
Essentially, the aim of my project was to develop a tool which allows people to search for a predefined list of Environmental, Social and Governance (ESG) keywords in company reports. Sustainability has been a key focus across different industries, and companies often publish ESG reports to highlight their goals and achievements. However, sometimes these reports contain hundreds of pages, which makes it time-consuming for users to read through it all. Having this tool can help users to narrow down the specific pages which they should focus on (with the highest frequency of keywords they are interested in) and is almost equivalent to personalising the table of contents. I also created stacked bar charts and word clouds to help users analyse these keywords.
Benefits of the mentorship
This project has been beneficial for me in several ways. First, in terms of technical skills I have improved my working knowledge of Python. I learned more about basic data types such as series, dicts, lists and dataframes. I got exposure to four different Python libraries: pandas, numpy, wordcloud and pdfminer. This can be useful because I plan to volunteer for data-science related projects in the future (such as DataDive weekends at Datakind) and data analysis/visualisation is often a key skill they look for in volunteer applications.
Additionally, I have also learned how to write functions and loops which provide useful building blocks for me to continue learning about Python (or any programming languages). My mentor Raphaela also introduced me to different IDE/environments for real-time code sharing (GitHub, PyCharm, Google Colab and HackMd). We also did some code review during our project, which has helped me to understand how coders/data scientists/software engineers work together in practice.
Finally, and most importantly, I’m glad that I met my supervisor Raphaela through this project. She is a current PhD student in computer science. Since I am also a recent university graduate, having a PhD student as my supervisor made the project experience more relatable. Raphaela is well-organised and has great project management skills - she came to each meeting prepared with codes, ideas, and action items. Her virtual door was always open (just an email away) and she provided constructive feedback promptly. It was a delightful experience to work with her because I didn’t need to worry about anything else apart from coding! One of my favourite quotes that Raphaela introduced me to is, “Code is read much more often than it is written” (original quote from Robert C. Martin). This changed my perspective from coding being a lonely activity to a fun piece of collaborative work where we can inspire and help each other to grow. I will remember that when I write my documentations.
To sum up, I would highly recommend a mentorship programme to someone who is interested in learning more about coding/data science/software engineering in general. While it would be ideal if you have a project idea before you apply, I will still recommend applying anyway because your project will naturally evolve over time and your supervisor can also help to point you in the right direction. And plus, the Learning to Code programme is free! You just need to dedicate some time to it each week, but you will get the opportunity to learn something new and have an amazing supervisor to support you along the way.
If you’d like to get in touch with Angela you can email email@example.com.