Analysing texts in digital archives

TEXTvre.jpgThe impact

Humanities researchers can now easily install and deploy the TextVRE software, allowing them to process and analyse research texts that are held in metadata-rich digital archives.

Easy to use, stable and with a comprehensive instruction set, the software broadens the research possibilities for text-based data. Access and analysis of the data are supported by annotation and retrieval technology – TextVRE will provide services for every step in the digital research life cycle.

The problem

Digital Humanities researchers at Kings College London were looking for software to improve their analysis of text-based data. The only suitable product available was TextGrid, an internet-based collaborative environment that offered tools and services for the analysis of texts stored in digital archives. TextGrid, however, had been written specifically to run on the German national grid service, D-Grid. It was therefore decided to create a new, easy to use version specifically for humanities researchers across in the UK.

A TextVRE team was put together, including developers from the Software Sustainability Institute, KCL and the University of Edinburgh, and began work to port TextGrid to the Fedora Object Repository.

The solution

Having been gradually developed by German researchers since the project’s inception in 2006, the original TextGrid software was “very complex and not intended to be installed by anyone other than the TextGrid developers themselves” says Malcolm Illingworth, Applications Consultant at the University of Edinburgh’s EPCC.

Over seven months the TextVRE developers worked hard to tidy up the software and created a version that worked – just. Basic and full of bugs, it was enough to show what could be done. At this point the Software Sustainability Institute seconded Malcolm Illingworth and provided help and advice to take TextVRE to the next level.

Over those six months the TextVRE software was thoroughly tested and stabilised to create a properly usable product. A complete set of installation instructions were put together and presented on the project’s website, allowing any researcher to install TextVRE and get it to work.

A virtual machine image was also created, to allow a TextVRE installation to run out of the box, with minimal reconfiguration. In addition, the heavily layered architecture, designed specifically for D-Grid, was modularised to let it work well with Fedora.

“We had fulfilled our contract after seven months, but what was supplied was a bit rough and ready, a top-level project,” says Illingworth. “The Software Sustainability Institute helped us complete our work.”

The TextVRE product is now fit for purpose and in a position to help humanities researchers across the UK.