CW22 - Mini-workshops and demo sessions

Photo by fauxels on pexels

Mini-workshops and demo sessions will give an in depth look at a particular tool or approach and a chance to query developers and experts about how this might apply to participants’ areas of work.

Here is the list of mini-workshops and demo sessions that will take place at CW22.

 

Back to the CW22 agenda

person giving a presentation to people
Session 1 CW22 Day 2: Tuesday, 5 April 2022 from 14:25 - 15:25 BST (13:25 - 14:25 UTC, check your time zone)
1.1 Code Review During Development (60 minutes)

Facilitator(s): Thibault Lestang (Imperial College London), Matthew Bluteau (UKAEA), Jeffrey C. Carver (University of Alabama), Fergus Cooper (University of Oxford), Barry Demchak (Torrey Pines Software), Hollydawn Murray (Health Data Research UK), David A. Nicholson (Emory University), and Miguel Xochicale (King's College London)

Abstract: Researchers benefit greatly from small and regular code reviews during development, in addition to larger end-of-work reviews. Code review also plays a role in producing sustainable research software. However, establishing a culture of code review in research has been made difficult by a lack of guidance and awareness of its benefits. Furthermore, while good practices for code review have been established in the software industry,  it is still unclear what processes and workflows work best for code review in a research setting.

In this demo session, we will present a guide to code review during development, designed for researchers, by researchers. Drafting of this guide has been driven by the Code Review Community, an international and multi-sectoral group interested in advancing code review in research. The primary result of this work is a website (https://researchcodereviewcommunity.github.io/dev-review/) describing processes that researchers can follow to integrate code review into their daily research routine. The guide introduces researchers to code review, provides practical advice on how to get started with it, and brings together existing resources for reference. Although the guidelines are built on existing research and the experience of Code Review Community members, it is a result of discussion within the community and has not been extensively tested in the field.  Our goal is to get feedback and suggestions on the guidance currently offered by the Code Review Community, and to develop our understanding of code review within research more broadly. In particular, we seek to understand whether our guidelines are accessible to all, including under-represented groups in research.

The intended audience for this demo session is any researcher writing code, from short analysis scripts to full-fledged applications. After presenting the website we have developed, we will split into breakout rooms for discussions with workshop participants. After the session, participants will be better equipped to foster and sustain code review within their own local research organizations, and they will have contributed to the improvement of the guidelines. We will also invite session participants to become regular contributors to the Code Review Community.

 
1.2 Developing an Inclusive Research Leadership Training Curriculum (60 minutes)

Facilitator(s): Tracy Teal (RStudio) and Neil Chue Hong (Software Sustainability Institute)

Abstract: Researchers and research software developers have developed experience and expertise in their areas of work. They have spent time learning how to code, analyze data and are experts in their domains. As people become leaders in their field, whether it’s a small group or a large team, they also need to develop leadership and management skills, but often haven’t had the opportunity to learn them, or are self-taught. This lack of knowledge around team leadership negatively impacts the person in the leadership position as well as those in their team. Therefore there is the opportunity for short-format, practical, hands-on training for people in or transitioning to research leadership roles.

We know a lot now from research around leadership as to what makes effective leadership, meaning leadership that allows a team to do its best work, both for the individuals on the team, and the team itself. That includes elements of creating psychological safety and providing opportunities for mastery, autonomy and purpose, and centering inclusiveness, accessibility and culturally responsive practices. Overall, what we know makes for effective leadership, is not always how we see leadership practiced. Therefore not only are there not learning opportunities, but what people learn by ‘watching’ are not effective practices. 

We are developing a course on Inclusive Research Leadership modeled on the The Carpentries 2-day workshop format, which aims to provide participants with opportunities to learn about leadership, based on what we know works, and that values people - both the leader themselves and the people they lead. 

This session will introduce the key topics covered by this course and ask participants to give feedback on what they would like to see in the course. The structure of the workshop will be to answer and discuss structured questions in a group format as well as discuss and share question responses as pairs. 

We will also pitch a sprint to further develop the course as part of the Collaborations Workshop Hackday.

Prerequisites for participants: This session will be most relevant for people who are leading, or soon will be leading, research or research software groups. We’ll be thinking about the things that group leaders need to know and practice in their group to create inclusive and effective spaces for research and research software development.

 

1.3a Designing an interactive tool to help researchers make their software reproducible (30 minutes)

Facilitator(s): Sam Harrison (UK Centre for Ecology & Hydrology)

Abstract: Research software must be reproducible, but how reproducible varies vastly amongst different software used by different people for different purposes. To decide how reproducible your software needs to be, it is useful to use structured levels of reproducibility (https://doi.org/10.5281/zenodo.4761867), which classify software from "barely repeatable" one-time scripts, to software used as infrastructure. We can then outline the best practices needed to meet each of these levels, ensuring that simpler software isn’t burdened with onerous requirements that are only really required for infrastructure level software.

We are creating an interactive tool to help researchers decide what level of reproducibility their software is, and offer targeted guidance on meeting that level of reproducibility. Through a series of simple questions about their software, researchers will be provided with guidance on reproducibility that is tailored to their needs. The goal of this mini-workshop is to get your help to structure this tool. Together, we will refine these levels of reproducibility, structure the questions the tool asks about your software, discuss options for software to implement the tool, decide on the best guidance currently available, and identify gaps in current guidance that we can help fill.

 

1.3b Introducing Best Practice and Overcoming Barriers for Code Sharing and Review in the Life Sciences (30 minutes)

Facilitator(s): Annie Jeffery, Sophie Eastwood and Alejandro Arguelles Bullon (University College London)

Abstract: The sharing and review of code is standard practice in many software communities, however, it is often feared and avoided by life sciences researchers!  Sharing research code has many benefits:  While data management activities are often written from scratch according to the preferences of the individual researcher, sharing code would prevent weeks wasted on duplication and help understand variation in findings that may be due to differences in data management decisions.  Similarly, sharing variable codelists and statistical programming code can enable the research community to understand, validate and build on research findings.  Furthermore, providing researchers with constructive and supportive feedback on their code has been shown to improve the quality of research coding, and to be an efficient way to share knowledge.   

There are, however, a number of barriers experienced by life sciences researchers to sharing code and/or submitting code for review.  Most researchers have no formal coding training, and so, often lack the confidence to share their code.  Researchers may also be reluctant to share weeks or months of hard work, with the expectation of little or no credit for its utilisation.  In life sciences specifically, the use of sensitive human data requiring heightened data security, has at times led to a culture that can be wary of open science practices.

The Pharmaco-Epi Data Collaborative is a network of multidisciplinary researchers in the pharmacoepidemiology domain.  We aim to promote open science, share learning and facilitate collaborations.  As part of this, we are building an Online Knowledge Hub, two key features of which will be a library of shared code and a space to submit code for constructive and supportive review.

The aim of this workshop is to consult researchers, stakeholders, and open science or coding enthusiasts from the life sciences and beyond, to identify code sharing/review best practice and solutions to potential barriers in the life sciences.  The outcome of the session will be a published list of requirements, that will be used as the basis for the Pharmaco-Epi Data Collaborative’s  code sharing and review platform.  

 

1.4a ResearchEquals.com - an open source publishing platform (30 minutes)

Facilitator(s): Chris Hartgerink (Liberate Science GmbH)

Abstract: Open science brings many changes, yet publishing remains the same. As a result, many improvements in the research and education process can't fulfill their promises. In order to facilitate a rapidly changing research ecosystem, ResearchEquals allows researchers to publish whatever outputs their work creates, instead of working to create outputs that can be published. Building on open infrastructures, ResearchEquals allows you to publish over 20 different types of research modules, with more being added based on the needs from you. Example modules include theory, study materials, data, or software. However, other outputs, like presentations, figures, or educational resources can also be published. All of these research steps are linked together, to create a research journey - recognizing that educational materials, research design and analysis are all part of our learning journeys.

In this workshop you will get an introduction to ResearchEquals, learn about its technology stack, its financial model, and learn how you can join the efforts to evolve the platform (both code and no-code).

 

1.4b Ersilia, a Hub of AI/ML models for infectious and neglected disease research (30 minutes)

Facilitator(s): Gemma Turon and Miquel Duran (Ersilia Open Source Initiative)

Abstract: The Ersilia Model Hub is an open-source platform that incorporates pre-trained, ready to use AI/ML models related to drug discovery. The overarching goals of the Hub are, on one hand, to lower the barrier of access to these tools to non-experts, facilitating the application of these models to day-to-day experimental pipelines, and, on the other hand, to provide a platform where data scientists and developers can offer their models to the wide scientific community in a user-friendly manner, instead of simply releasing the code in repositories or having to develop their own deployment solutions. This tool is developed and maintained by the Ersilia Open Source Initiative, a recently born non profit organisation.

In this workshop, we aim to give an introduction to the tool and discuss the current landscape for interaction between RSE, data scientists and experimental researchers in the UK. The goal is to understand what features are essential to include to improve sustainability and reproducibility, discuss implementation strategies, identify needs in user and contribution guidelines and, overall, gather first-hand feedback from all of the participants as we prepare to launch the Hub.

The outcomes of this session will be summarised in a blogpost where participants' input will be acknowledged.

 

Session 2 CW22 Day 3: Wednesday, 6 April 2022 from 10:45 - 11:45 BST (9:45 - 10:45 UTC, check your time zone)

 

2.1 ReproHack Hub: A Guided Tour (60 minutes)

Facilitator(s): Anna Krystalli (University of Sheffield)

Abstract: Reprohacks are one day hackathons providing a sandbox environment for practicing reproducible research. Authors submit paper with associated code and data for review. During events, participants attempt to reproduce submitted papers in teams, and feed back their experiences to authors by completing a review form.

Recently, the N8 CIR supported the development of an online custom-built ReproHack Hub to facilitate easy and efficient delivery of events which was successfully launched in November 2021. We would love to see more people using the hub to organise their own ReproHack events and are pleased to invite prospective ReproHack event organisers to this train-the-trainer session.

Join us to learn how to prepare for and deliver successful and engaging ReproHack events using our new Hub to administer all activities!

 

2.2 Crowdsourcing Community Practices for Reproducible Computational Environments in the Cloud (60 minutes)

Facilitator(s): Sarah Gibson (The International Interactive Computing Collaboration - 2i2c), Meag Doherty (NIH), Min Ragan-Kelley (Simula Research Laboratory), and Achintya Rao (The Alan Turing Institute)

Abstract: 

As the scale of data-intensive research grows, various scientific domains are experiencing a shift towards cloud-based workflows. This comes with a need to reliably and reproducibly define a computational environment that can be transported between local and cloud systems. Many tools providing this service already exist, broadly falling into two camps: provide a “kitchen-sink” environment with many of the data-science tools a researcher expects to have available; or enable bespoke, analysis-specific environments to be built on the fly. For tools in the second camp, understanding how various domains working across a range of programming languages expect to define their environments is critical to their success. If these expectations do not align with the best practices of a community, that community may avoid using the tool, or even develop their own, bespoke solution. This breaks the ethos that open science is built upon: that sharing knowledge, interoperable tools, and decentralised infrastructure can accelerate the progress of research since work is no longer repeated in siloes.

However, the broader the range of communities you aim to support, the broader the range of usage patterns and best practices becomes. Simply keeping up to date with multiple community practices is in itself a huge undertaking. And while a tool may be interoperable and support multiple languages, the tool itself is still a codebase in a given language and hence a scenario develops where the maintainers of the tool are not experts in all the communities they support. To address these challenges, we need continuous dialogue with communities and feedback to guide development and identify awareness gaps in how the tool is being used and is expected to work.

For this mini-workshop, we would like to invite a diverse range of people working across various languages to participate in an interactive survey pilot and facilitated discussion around the best practices they currently use for defining reproducible computational environments. The aim of the session will be to better understand the pain points of using a cloud-based environment service, and how the assumptions such a service makes may or may not align with various community best practices. As a popular tool in this area, we will use mybinder.org as an example of such a service, though prior knowledge is not a requirement, and we hope to build strong connections with active users across communities to provide specific and useful feedback to the Binder team.

 

2.3a Foreseeing and Alleviating Technical Debt in Software (30 minutes)

Facilitator(s): James Byrne (British Antarctic Survey)

Abstract: The concept of technical debt is present in all arenas of technology and is never fully overcome. All too often it's incurred unnecessarily in the software realm and very often overlooked as a concept. In this talk I wish to introduce software developers to technical debt, the manifestations, the problems with addressing it, some approaches by which to alleviate it and describe the importance of adopting sustainable development approaches to alleviate from day one of the project. 

During the presentation we'll look at real world examples, both avoidable and unavoidable, from "hypothetical" industry and academic projects. We'll use these examples to consider how designing debt out at the start of a project doesn't have to be onerous and how that can be part of a holistic sustainable approach to software projects. During the demonstration, I'll point to resources that the attendees can leverage and how to look at them in the different light of reducing long term impact of software maintenance.

 

2.3b Writing tests to save time (30 minutes)

Facilitator(s): Alison Clarke (Durham University)

Abstract: Writing automated tests for research software can sometimes feel like something that will slow you down. This demo will show how you can use tests as a tool to save you time as you write your code, as well as giving you greater confidence in the results. If you’ve never written a unit test before, or have tried in the past but given up, this demo aims to get you going and to use automated tests in a way that helps you and suits your project.

The demo will show:

  • how to extract part of a section of code into a function to enable easier testing (as well as making your code clearer!)
  • how to add unit tests to quickly check that the function works correctly with different input values
  • how to use continuous integration to run your tests on every push, to ensure your software still works even when other changes are made

The demo will use python and pytest (though the principles could be applied to other programming languages), and will use GitHub Actions for continuous integration.

Prerequisites for participants: It would be helpful for participants to have a knowledge of python (at least to the level of being able to understand simple code).

 

2.4a Good mental health, good research software (30 minutes)

Facilitator(s): Dave Horsfall (Newcastle University) and Anika Cawthorn (University College London)

Abstract: Mental health is important, but talking about it at work can be a bit scary. It's difficult to know where to start, or what the response might be. After 2 years of a global pandemic our routines have been turned upside down, and the research software community is trying to adapt to hybrid working models that work wonderfully for some, and not so great for others. 

In this interactive workshop we'll talk about mental health freely, without judgement. I'll look at what mental health is, and investigate the common stressors for research software engineers. We'll explore how our routines have changed and reflect on what impact this might have on our health and wellbeing. Finally, I'll introduce suggestions to improve support for mental health within our teams, with plenty of opportunities to provide feedback and opinions anonymously throughout the presentation. 

 

2.4b Common Workflow Language Novice Tutorial (30 minutes)

Facilitator(s): Douglas Lowe (University of Manchester), Gerard Capes (University of Manchester), Melissa Black (Curii Corporation), and Michael R. Crusoe (Stichting DTL Projects)

Abstract: Orchestration Workflows are widely used in computational data analysis, enabling innovation and decision-making. Often the analysis components are numerous, and written by third parties, without an eye on interoperability. In addition many competing workflow systems exist, potentially limiting portability of workflows written for any one specific workflow system. This hinders the transfer of workflows between different systems and projects, limiting their re-usability. The Common Workflow Language (CWL) project (https://www.commonwl.org/) was established in order to produce free and open standards for describing command-line tool based workflows. The CWL language is declarative, and provides a focused set of common abstractions enabling the expression of computational workflows constructed from diverse software tools. Explicit declaration of requirements for runtime environments and and software containers enables portability and reuse. Workflows written according to the CWL standards are a reusable description of that analysis, runnable on a diverse set of computing environments. 

A number of workflow engines, listed on the project webpage above, have implemented support for the CWL language, enabling use of CWL workflows on a wide range of platforms. Libraries of CWL tool descriptions are available (e.g. https://github.com/common-workflow-library), and CWL workflows can be published on WorkflowHub (https://workflowhub.eu/), facilitating the sharing and reuse of these tools and workflows. Training material for introducing researchers to workflow design and construction using the CWL language is being developed, based on Software Carpentry principles (https://carpentries-incubator.github.io/cwl-novice-tutorial/). In this mini-workshop session participants will be introduced to the principles of CWL, learning how to write and run CWL workflows on their own computer. 

Prerequisites for participants: To take part in this session participants will need docker installed on your computer, as well as python, and a unix command line interface. And we ask participants to follow the setup instructions (https://carpentries-incubator.github.io/cwl-novice-tutorial/setup.html) before the session.

 

Back to the CW22 agenda