By Adam Crymble, Institute Fellow 2013
This is the first in a series of articles by the Institute's Fellows, each covering an area of interest that relates directly both to their own work and the wider issue of software's role in research.
If the Internet went down all historical software would cease to function, except for Microsoft Word. For an academic historian, a grant to build a high profile web-based project is likely the biggest pot of money he or she will ever receive during their career. That is, if they ever receive it as few historians will even apply. Instead, most are content to work in a fashion relatively similar to the way they did before the Internet came along. They go to the archives, read books and manuscripts, and write up their findings. This is their tried and tested mode of research, with costs limited to a few new books now and again, a train ticket or two to get to the archives, and refreshments while they're there.
Historical research is still largely a solo intellectual pursuit rather than a technical team-based one. There is nothing wrong with that. Not all discovery needs to be expensive, and as a tax-payer, I find it refreshing that there are still corners of the academic world in which spending more money isn't the easiest way to career progression. For the ambitious few who rise to the challenge and put in a proposal, meanwhile, the website that results, and in some cases the hundreds of thousands of pounds of funding that come with it, have made project leaders celebrities within the field. This celebrity comes with it all the accolades and resentment one might expect from fame.
These websites have done a great service to the historical community and have normalised the idea of using software in the research process. What they provide varies tremendously, but typically it's either access to data, or a specific functionality.
For historians, data usually means digitised sources. Large-scale digitisation projects have been underway since the turn of the twenty-first century. Millions if not billions of historical sources that range from newspapers to census materials are now online, which means that in many cases people can freely keyword search through documents stored in archives on the other side of the world. This investment in digitisation means less unnecessary travel, and greater access to our cultural heritage. The website interfaces provide a familiar access point that anyone from the twenty-first century can navigate comfortably.
The other type of site is one that offers functionality. There are many examples, but they usually offer services that aid in specific types of analysis. That might be as simple as providing a word-cloud of keywords found in a text that has been pasted into a web form, or can be as complicated as a network analysis of thousands of individuals found in a series of correspondence. These functional websites also have the advantage of offering an online experience in a way that most people recognise.
The web-based user interface means the barrier to entry is low, even for users who may be unaccustomed to using software in their research beyond an occasional word processor. Though these websites are expensive, they do tend to be able to demonstrate a high impact, and in some cases feed the creative juices of talented individuals who base novels and television series on the materials found on the sites, in addition to the mountains of academic research they facilitate. But what happens if the Internet goes down, or if you're on the road and don't have Wi-Fi, or if the website crashes?
When something does go wrong we quickly realise it wasn't the website we needed. It was the data, or it was the functionality. The online element, which we so often see as an asset, has become a liability. It's like a hammer that only works on Tuesday, and as luck would have it, it's Wednesday morning. The headaches extend to those responsible for these websites as well. If there is a malfunction, someone has to be there to fix it. That person costs money, and since nearly all sites built by academics are free access, universities are left with a choice between a large bill or an offline site. Chances are, a modest-sized site that goes down will simply stay down permanently, because more often than not the temporary staff who were hired to create the project and who understood it have moved on to other contracts at other institutions.
I love these websites. But they're the ready-meals of academic software. They've made us reliant, and they mean so many of us no longer know how to cook or never learn how to cook in the first place. The functional sites offer experiences not unlike the apps we're now used to on our phones, but which could work equally well offline on our desktops if they were so available. These data-driven websites feed us search results in a way that mirrors the search engine giants, because we're used to it and we like it, and not because it is the best or most economical way to transmit the data. In many cases the best format is probably a spreadsheet, or a relational database, or a series of marked-up XML files that could be downloaded from any number of online repositories. Sure they still need to be downloaded using the Internet, but an active connection isn't needed to make use of what's out there. Once it's downloaded you can use it anywhere and whenever you like. Just like a book.
I think it's important to provide people with the skills to work effectively and efficiently with raw data in a range of formats, or to know how to build the tool they'd like to use if one does not already exist. I'm not sure history departments in general do a good job of that yet. They teach their students how to navigate the archives, how to handle documents properly, and how to form strong arguments about their findings. Yet only the lucky ones get a chance to work with a large set of XML files, or a relational database. Even fewer are taught how to write software of their own to aid their research process. I could go on and on about why I think these skills are important and how they will make students better researchers. I think that's true, but more importantly, we can save a fortune if we learn these skills because it means we can stop building expensive websites. We have become over-reliant on the Internet yet do not realise the sustainability issues that this brings. Apart from the subscription-based genealogical websites run by corporate giants, there are no great financial sustainability models yet in place for these projects. A very small number have turned to ad revenue to pay for annual updates or regular maintenance, but for the most part the only solution is more grant funding.
On the other hand, a repository to store a dataset - even a large dataset - is surprisingly inexpensive. In fact, there are lots of places that will happily host and disseminate it for free. All you need to do is post it in a few locations, perhaps archive a copy in your institutional repository, and then tell people where they can find it. The same is true for the functionality websites - if it can be done online, it can be done offline. Rather than turn this functionality into a website that needs to be hosted, maintained, and protected from attack by people with too much time on their hands, why not write it as a program that people can download and use on their computer? If they feel so inclined, they might even contribute to the project, adding new functionality that they can in turn share with the world.
Why have we not already seen this in historical research? Well we have, to some extent. It just doesn't tend to come with coverage in The New York Times as do some of the major web-based projects, but it happens all the time on academic blogs and on websites like GitHub that make sharing code easy. Some brave souls have also tried to teach other historians how to code in a project, which I am involved in, called The Programming Historian. This project provides free access to peer-reviewed tutorials that teach historians with no prior programming experience how to write their own software to do the types of tasks historians undertake in real research. With these skills in hand, historians will become more confident users of new types of software that do what they want without the needed flashiness of a website interface. The barrier to access will lower when the skillsets rise. We have begun to make baby steps, particularly for those self-motivated enough to get out there and try.
Yet I think the real reason we have not seen more of this inexpensive approach to sharing is because, while incredibly useful, it doesn't translate into career advancement for those involved in the production of these resources. For a historian, sharing software on your blog may gain you a following, but it's unlikely to get you a pay-raise, or make the difference at a job interview.
The unfortunate fact is that even history, one of the last bastions of quiet, inexpensive scholarship, is about to follow the way of the core sciences, where people who win big grants get noticed, and where coverage in The New York Times is a better way to get food on the table than the promotion of a low-cost, economically sustainable alternative.
Yet instead of spending a fortune on a single complicated website that serves up data on an ongoing basis, why not spend that same money to fund dozens of high quality data sets that are freely shared and disseminated online, and without any central maintenance costs? Sure it's less sexy than a high-profile website that gets picked up in the mainstream media, but it is more responsible and sustainable. Finally, if we get this right, we can all save ourselves a lot of money, and get a lot more done.