Towards cultural change in data management - data stewardship in practice

Amsterdam.jpgBy Martin Donnelly, University of Edinburgh.

Late last month, I took a day trip to the Netherlands to attend an event at TU Delft entitled “Towards cultural change in data management – data stewardship in practice”. My Software Sustainability Institute Fellowship application “pitch” last year had been based around building bridges and sharing strategies and lessons between advocacy approaches for data and software management, and encouraging more holistic approaches to managing (and simply thinking about) research outputs in general. When I signed up for the event I expected it to focus exclusively on research data, but upon arrival at the venue (after a distressingly early start, and a power-walk from the train station along the canal) I was pleasantly surprised to find that one of the post-lunch breakout sessions was on the topic of software reproducibility, so I quickly signed up for that one.

I made it in to the main auditorium just in time to hear TU Delft’s Head of Research Data Services, Alastair Dunning, welcome us to the event. Alastair is a well-known face in the UK, hailing originally from Scotland and having worked at Jisc prior to his move across the North Sea. He noted the difference between managed and Open research data, a distinction that translates to research software too, and noted the risk of geographic imbalance between countries which are able to leverage openness to their advantage while simultaneously coping with the costs involved – we should not assume that our northern European privilege is mirrored all around the globe.

The first keynote came from Danny Kingsley, Deputy Director of Scholarly Communication and Research Services at the University of Cambridge, whom I also know from a Research Data Management Forum event I organised last year in London. Danny’s theme was the role of research data management in demonstrating academic integrity, quality and credibility in an echo-chamber/social media world where deep, scholarly expertise itself is becoming (largely baselessly) distrusted. Obviously as more and more research depends upon software driven processing, what’s good for data is just as important for code when it comes to being able to reproduce or replicate research conclusions; an area currently in crisis, according to at least one high profile survey. One of Danny’s proposed solutions to this problem is to distribute and reward dissemination across the whole research lifecycle, not only attaching credit and recognition/respect to traditional publications, but also to datasets, code and other types of outputs.

After a much-appreciated coffee break, Marta Teperek introduced TU Delft’s Vision for data stewardship, which, again, has repercussions and relevance beyond just data. The broad theme of “Openness”, for example, is one of the four major principles in current TU Delft strategic plan, indicating the degree of institutional support it has as an underpinning philosophy. Marta was keen to emphasise that the cohort of data stewards which Delft have recently hired are intended to be consultants, not police! Their aim is to shift scholarly culture, not to check or enforce compliance, and the effectiveness of their approach is being measured by regular surveys. It will be interesting to see how they have got on in a year or two years’ time: already they are looking to expand from one data steward per faculty to one per department.

There followed a number of case studies from the Delft data stewards themselves. My main takeaways from these were the importance of mixing top-down and bottom-up approaches (culture change has to be driven from the grassroots, but via initiatives funded by the budget holders at the top), and the importance of driving up engagement and making people care about these issues.

After lunch we heard from a couple of other European universities. From Martine Pronk, we learned that Utrecht University stripes its research support across multiple units and services, including library and the academic departments themselves, in order to address institutional, departmental, and operational needs and priorities. In common with the majority of UK universities, Utrecht’s library is main driving and coordination force, with specific responsibility for research data management being part of the Research IT programme. From Stockholm University’s Joakim Philipson we heard about the Swedish context, which again seemed similar to the UK’s development path and indeed my own home institution’s. Sweden now has a national data services consortium (the SND), analogous to the DCC in the UK, and Stockholm, like Edinburgh, was the first university in its country to have a dedicated RDM policy.

We then moved into our breakout groups, in my case the one titled “Software reproducibility – how to put it into practice?”, which had a strange gender distribution with the coordinators all female, but the other participants all male. One of the coordinators noted that this reminded her of being an Engineering undergraduate again. We began by exploring our own roles and levels of experience/understanding of research software. The group comprised a mixture of researchers, software engineers, data stewards and ‘other’ (I fell into this last category), and in terms of hands-on experience with research software roughly two thirds of participants were actively developing software, and another third used it. Participants came from a broad range of research backgrounds, as well as a smaller number of research support people such as myself. We then voted on how serious we felt the aforementioned reproducibility crisis actually was, with a two-thirds/one-third split between “crisis” and “what-crisis?” We explored the types of issues that come to mind when we think about software preservation, with the most popular responses being terms such as “open source”, “GitHub” and “workflows”. We then moved on to the main business of the group, which was to consider a recent article by Hut, van de Giesen and Drost. In a nutshell, this says that archiving code and data is not sufficient to enable reproducibility, therefore collaboration with dedicated Research Software Engineers (RSEs) should be encouraged and facilitated. We broke into smaller groups to discuss this from our various standpoints,and presented back in the room. The various notes and pitches are more detailed than this blog post requires, but those interested can check out the collaboratively-authored Google Doc to see what we came up with. The breakout session will also be written up as a blog post and an IEEE proposal, so keep an eye out for that.

After returning to the main auditorium for reports from each of the groups, including an interesting-looking one from my friend and colleague Marjan Grootveld on “Why Is This A Good Data Management Plan?”, the afternoon concluded with two more keynote presentations. First up, Kim Huijpen from VSNU (the Association of Universities in the Netherlands) spoke about “Giving scientists more of the recognition they deserve”, followed by Ingeborg Verheul of LCRDM (the Dutch national coordination point for research data management), whose presentation was titled “Data Stewardship? Meet your peers!” Both of these national viewpoints were very interesting from my current perspective as a member of a nationally-oriented organisation. From my coming perspective as manager of an institutional support service – I’m in the process of changing roles at the moment – Kim’s emphasis on Team Science struck a chord, and relates to what we’re always saying about research data: it’s a hybrid activity, and takes a village to raise a child, etc. Ingeborg spoke about the dynamics involved between institutional and national level initiatives, and emphasised the importance of feeling like part of a community network, with resources and support which can be drawn upon as needed.

Closing the event, TU Delft Library Director Wilma van Wezenbeek underlined the necessity of good data management in enabling reproducible research, just as the breakout group emphasised the necessity of software preservation, and in effect confirming a view of mine that has been developing recently: that boundaries between managing data and managing software (or other types of research output) are often artificially created, and not always helpful. We need to enable and support more holistic approaches to this, acting in sympathy and harmony with actual research practices. (We also need to put our money where our mouth is, and fund it!)

After all that there was just enough time for a quick beer in downtown Delft before catching the train and plane back to Edinburgh. Many thanks to TU Delft for hosting a most enjoyable and interesting event, and to the Software Sustainability Institute whose support covered the costs of my attendance.

Several resources from the event are now available:

Posted by s.aragon on 8 June 2018 - 10:24am