Data Carpentry for Humanities

Posted by g.peru on 12 October 2017 - 3:30pm

By Giacomo Peru

On 26th and 27th September, Oxford held one of the first Data Carpentry workshops for Humanities*. The workshop is fruit of a collaboration between Reproducible Research Oxford and the Software Sustainability Institute. Iain Emsley has undertaken the endeavour of porting the Ecology lessons to a Humanities version, using Early English Books Online Text Creation Partnership texts as the dataset. The choice has been to port Python but R will come next. The team of instructors was Iain (Python), Pip Willcox, from the Bodleian Libraries’ Centre for Digital Scholarship (Spreadsheets) and Lucia Michielin, from the University of Edinburgh (Open Refine and SQL).

According to the instructors, the dataset needs more cleaning (for example, multiple authors come in the same column!). The lessons need further revision but there is hope to submit them to Data Carpentry for consideration by the end of the year.

Contributions are therefore welcome!

*We are aware of https://dh-southernafrica.github.io/2017-08-01-Potch/.