Session 1.1 - Research Software Sharing, Publication, & Distribution Checklists
Abstract
These research software sharing, publication, and distribution checklists were inspired by similar checklists produced by the imaging community and are intended to address a niche not addressed by existing similar resources such as guidance for creating research software management plans. They provide a tiered approach to completing the checklists with each item able to be completed at a different level: Bronze, Silver, Gold, and Platinum where bronze is highly attainable and platinum is going well above and beyond. With this approach I'm aiming to somewhat gamify the process as well as to provide aspiration goals and not just to set a minimum floor for compliance. The checklists are tailored based on a simple taxonomy of research software output types: 'records of specific analyses', 'web based services', 'software packages', and 'pipelines or workflows'. Each of these types of software output have overlapping but slightly different considerations for how best they can be shared, published, and distributed so each has its own checklist. Each checklist is comprised of items which address these 11 common themes: Source control, Licensing, Documentation, Making Citable, Testing, Automation, Peer review / Code Review, Distribution, Environment Management / Portability, Energy Efficiency, and Governance, Conduct, and Continuity. The checklists are provided as simple markdown files making the checklists easy to include in a project repository like standard licenses and codes of conduct. Each theme includes its four tiered checkbox items and an expandable section which expands on how to complete these checkboxes with advice and links to additional resources. This keeps the checklists appearing succinct and approachable to prospective users whilst still providing depth. To extend the gamification there are repo badges to display the type of checklist, an overall score, and the medal for the project based on a points system.
I am now seeking feedback on, critique of, and input to, the initial early draft of these checklists from other researchers and research software engineers who might use them, with the goal of releasing an initial version and seeking collaborators with whom to continue to maintain the checklists and potentially to publish a piece promoting them.
Project Repository: https://gitlab.com/HDBI/data-management/checklists
Audience
If you are publishing a research paper that has any analysis code associated with it, are developing a software tool or analysis pipeline which will be used by researchers, or deploying a web service which will be used by researchers I'd like your input.
An account on **gitlab.com** and basic git skills would be helpful for participants wanting to directly propose edits, but not necessary. There will be a collaborative doc/pad for anyone not familiar with git to share their comments.
Session 1.2 - How do you ensure your science is seen, used and sustained?
Bethan Iley, OLS / Queen’s University Belfast
Debs Udoh, OLS
Yo Yehudi, OLS
Joyce Kao, Digital Research Academy
Heidi Seibold, Digital Research Academy
Abstract
Open software and research projects often operate under the FAIR principles—findability, accessibility, interoperability and reproducibility—but human findability can be neglected. Without effective communication and strategic outreach, even the most impactful research software can go unnoticed, limiting adoption, collaboration, and long-term sustainability. This can be prevented by developing the marketing and communications skills of "multipliers" who effectively promote their projects, activities and services.
In this interactive workshop, you will be introduced to evidence-based digital marketing and communications techniques and how they can apply to research software projects. Examples of promotional activities within and beyond traditional outreach activities will be highlighted, with a focus on what works for different target audiences. You will be supported to apply this knowledge to your own project using a short worksheet.
The workshop will then shift to a facilitated discussion model with the intention of developing open resources to support marketing and communications in open software and research projects. This will seek to identify (1) examples of best promotional practices that you are aware of; (2) your barriers to engaging in marketing and communication activities, including (but not limited to) diversity, equity and inclusion considerations; and (3) potential ways for communities to overcome these barriers. Your inputs will be used to develop tailored resources for research infrastructure communities.
Audience
This session will not require prerequisite knowledge or skills. These skills are becoming increasingly relevant for professional development, even if you do not (yet) have formal experience in open software or research projects.
Sessions 1.3 - Navigating trade-offs between maintainability and contextualisation
Abstract
The software sustainability movement has matured and successfully embedded in many domains, but is not yet widely recognised or established in others. If we look at biodiversity informatics as an example, this is an area of concern, as modern biodiversity research is dependent on the development of software tools for data management, mobilisation and analysis, and the customisation of hardware for digitisation and field monitoring. These tasks must be balanced alongside existing responsibilities around the long-term curation of physical specimen objects. Collaborative development of solutions is necessary to ensure reproducibility and avoid technical debt and single points of failure. The application of open science principles is particularly relevant in biodiversity research, especially for specimen-based studies, where they can help address historical biases in scientific collections—often collected from biodiversity-rich regions in the Global South but housed in institutions in the Global North.
Programs like the Carpentries and the Turing Way have been successful in developing skills in research personnel, but there are still research communities where these are not yet embedded. We would like to examine how we can present such relevant initiatives to communities where they are less well-known, and how we should balance the trade-off between contextualising training resources—making them accessible across geographies, languages, and research domains—and maintaining resources that are scalable and sustainable. Contextualisation ensures relevance but demands significant effort and maintenance. Maintaining standardised training materials is easier but may lack local relevance.
The session will use the Carpentries as a case study to explore these trade-offs, particularly examining any potential benefits in creating a contextualised domain carpentry - e.g. a “specimen carpentry” for biodiversity informatics, and the costs, in terms of both creation and ongoing maintenance. The goal of the session is to make connections so that we can learn from other’s experience, and to develop practical guidelines for deciding when and how to contextualise training resources whilst balancing the need for sustainability.
Audience
This session will be of interest to participants who are curious about tailoring training resources or documentation to specific research domains, languages or local contexts. We particularly encourage participation from maintainers and contributors of training initiatives, like The Carpentries or The Turing Way, as well as participants with previous experience in collaborative translation or localisation, and learners who have benefited from domain-specific resources. No prerequisites are needed.
Session 1.4 - Computational Abilities Knowledge Exchange: Everyone likes CAKE - How do we make it better?
Marion Weinzierl, University of Cambridge, ICCS
Oscar Seip, SSI, University of Manchester
Nick Brown, EPCC
Eleanor Broadway, EPCC
Tobias Weinzierl, University of Durham
Andrew Gait, University of Manchester, Research Software Engineer (Research IT)
Stef Piatek, Society of RSE and University College London Hospitals Biomedical Research Centre
Abstract
Digital Research Infrastructure for computational sciences is a vast field, with a large number of participants, stakeholders and specialists, as well as many subfields and applications. Not everyone here is aware of everyone else, has the time to go to all the events or keep up with all the blog posts and papers. Similarly, the general public might not be aware of career paths and the potential impact of the work of research technical professionals. Therefore, the key question is how can we, as a community, ensure that knowledge is exchanged as efficiently as possible? The current situation risks people reinventing the wheel and missing out on opportunities.
In this workshop, we will invite the audience to reflect and provide feedback on existing and proposed methods of knowledge exchange and outreach. The goal is to create a knowledge exchange expert community and framework. In particular, we are also interested in hearing how to be more inclusive and accessible in facilitating knowledge exchange.
Join us and share your knowledge - there might even be cake!
Audience
We aim to attract a diverse audience for this workshop, so that we can collect many different perspectives, ideas and opinions. There are no special requirements or prerequisites for the session, but we are planning to use an online polling tool, such as Mentimeter or sli.do, for part of the session, and therefore it would be useful if participants brought a device such as a phone or a laptop.
Session 1.5 - The Research Software Quality Kit (RSQKit): introduction & ways to get involved
- Shoaib Sufi, University of Manchester
- Aleksandra Nenadic, University of Manchester
Abstract
The Research Software Quality Toolkit (RSQKit - https://everse.software/RSQKit/) lists curated best practices in improving the quality of your research software. It is intended for use by researchers, research software engineers, those managing and procuring funding for projects with a large research software component, those running research infrastructures involving software and those developing research software related policy at organisations and in projects.
These practices are informed by software excellence and quality in the context of research; with a focus on FAIR software, Open Research, community development and software engineering practices at different tiers of research software (analysis scripts, prototype tools and research software infrastructure).
RSQKit links to tools and resources which support best practices. It includes software quality dimensions and links to indicators and tasks to guide your usage of the best practice.
Research community use cases and usage of software practices is highlighted across the European Open Science Cloud (EOSC) Science Clusters (https://science-clusters.eu/) to inform and inspire.
Research software roles (e.g. the Research Software Engineer (RSE) and Researchers who code) are included to document the sources and practitioners of research software. This brings attention to the need for credit and career paths for research software related roles.
Best practices include links to training resources and existing guides and materials; giving a conceptual overview where existing material exists to help highlight existing best practice resources and not duplicate effort.
The workshop will introduce RSQKit, its aims and architecture, optional engagement with the EVERSE Network - https://everse.software/network/ and how to make contributions. All levels of experience and expertise are welcome.
Ways of contributing include (but are not limited to):
Suggesting tools
Suggesting existing best practices
Suggesting what is missing
Writing about research software use
Reviewing content
Writing guidance on research software quality topics
Suggesting related standards, training and communities of practice
You can find RSQKit on GitHub at - https://github.com/EVERSE-ResearchSoftware/RSQKit (issues and pull requests always welcome).
Audience
Anyone with an interest in research software is welcome, some knowledge of software best practices is useful as it is being able to use GitHub to raise issues and/or raise pull requests but it’s not essential. Even if you are just interested in the topic we believe you will find value and be able to contribute.
Session 2.1 - Here, use mine: how to make software that others want to use
Sangeeta Bhatia, Imperial College London
Dr Sabine van Elsland, Imperial College London
Abstract
All of us interact with various software tools in our professional and daily lives, encountering frustration and less often, delight in the process. While some tools seamlessly integrate into our workflows, others prove to be cumbersome, unintuitive, or poorly documented. If alternatives exist, we will even abandon a tool rather than persist in fighting it. But what makes a good software tool, and what will get even the non-tech inclined excited to use it?
This workshop will explore the attributes that determine whether a tool is adopted or abandoned, with a special focus on non-technical users. By non-technical users, we mean anyone using a tool outside their core expertise, who may lack familiarity with workarounds or specialised knowledge. By understanding the barriers these users face, we can develop insights into the attributes that make software tools more accessible, efficient, and user-friendly. We are particularly interested in tools that facilitate data collection, but are very keen to hear from developers and users of all software tools.
We will begin by sharing our own experiences of navigating software challenges when performing a common task—extracting data from a scientific paper. This exercise will serve as a relatable starting point. We will then break up into smaller groups for discussions, guided by open-ended prompts designed to elicit different perspectives on usability, accessibility, and design. This will be your chance to vent and channel your frustrations about all the awful software tools that you have been forced to use. Following these discussions, we will reconvene to synthesise all your good and bad experiences into key takeaways for the user and developer community.
To create a persistent resource, we will document the insights generated in this workshop in a blog post on the Software Sustainability Institute website. Additionally, we aim to develop a collaborative publication on this topic, inviting all participants to contribute as co-authors. Through this workshop, we hope to foster a broader conversation on designing software that meets the needs of time-poor, non-technical users, ultimately leading to more widely adopted and effective tools.
Audience
The workshop is for everyone who has used software in their daily or professional lives. No prior skills or knowledge is needed.
Session 2.2 - AI-Driven tools for Software Repository Analysis, Discovery, Reusability & Interaction
Abstract
Software repositories are fundamental to research and innovation, yet they remain difficult to navigate, evaluate, and reuse due to their complexity and heterogeneous content. This demo presents a suite of AI-driven tools—RepoGraph, RepoSnipy, inspect4py, and RepoSim—designed to enhance the findability, reusability, and understandability of software repositories.
1) inspect4py performs static code analysis, extracting metadata, dependencies, and execution details to facilitate repository comprehension.
2) RepoGraph provides an interactive knowledge graph-based interface, enabling users to explore repositories visually and perform semantic queries.
3) RepoSim leverages deep learning embeddings to generate multi-level representations of repositories, supporting similarity-based recommendations.
4) RepoSnipy functions as a semantic search engine, clustering repositories based on embeddings to improve discovery and reuse.
These tools collectively advance the FAIRness of software repositories by integrating ML, NLP, and information extraction techniques, making repositories more accessible, interoperable, and reusable.
In this demo, I will showcase their functionalities, provide interactive hands-on exploration (for some of the tools), and discuss future enhancements for repository-driven AI research.
Audience
In this demo session, I will showcase the functionalities of inspect4py, RepoSim, RepoSnipy, and RepoGraph, and discuss future enhancements for repository-driven AI research. Participants will have the opportunity to interact with some of the tools hands-on:
1) inspect4py and RepoSim will be accessible via interactive Google Colab Notebooks. Meaning not installation is required to execute those.
2) RepoSnipy will be available via HuggingFace, allowing participants to explore its semantic search capabilities.
3) RepoGraph will be demonstrated live (by myself), with a guided walkthrough showcasing its knowledge graph-based repository exploration features.
#Prerequisites & Expected Knowledge
No prior AI or NLP experience is required, but familiarity with software repositories (e.g., GitHub, GitLab) and basic code analysis concepts will be beneficial.
Those interested in FAIR principles, repository metadata extraction, and semantic search will find the session particularly relevant.
# Software & Installation Requirements
No pre-installation is required—tools will be accessible via web interfaces and Google Colab Notebooks.
Participants who wish to explore the tools beyond the session will receive links to documentation and repositories for further experimentation.
# Expected Number of Participants
There is no strict participant limit, but the session will include an interactive Q&A and discussion segment to engage attendees and gather feedback.
This session will provide an interactive, hands-on introduction to AI-powered software repository analysis, discovery, and reusability, offering practical tools and insights for researchers and software engineers.
Session 2.3 - Improving Carbon Literacy for Researchers
Andy Turner, EPCC, University of Edinburgh
Loïc Lannelouge, University of Cambridge
Kirsty Pringle, SSI
Abstract
Due to the urgency of the climate crisis, UK universities have committed to Net Zero targets. In line with this, various initiatives and projects aim to help researchers better understand the emissions arising from their research, identify actions to reduce their impact. However, many researchers, even those familiar with emissions sources and mitigation strategies, struggle to grasp how the emissions from their work (e.g., software use or DRI) compare to other sources.
While financial costs and benefits are intuitive (e.g., £1M = rough cost of two semi-detached houses), carbon costs and benefits such as 1,000 kgCO2e (kilograms of carbon dioxide equivalent) lack a similarly intuitive reference frame. In other words, most of us have intuitive financial literacy but lack a similar carbon literacy.
Carbon literacy is important in itself in the face of the climate emergency, but is also key to making computational research more environmentally sustainable: researchers need it to have an understanding of what is a proportionate carbon footprint for a project before starting it, and of what the efficacy of “green” behaviours is. Without a relatable context, it becomes difficult for researchers to assess the scale of emissions from and impact of emissions reductions from their work and actions.
In this mini-workshop we will use the experience gained from the Green Algorithms project and the ARCHER2 national supercomputing service to provide an interactive session where researchers, RSEs and RTPs can become more carbon literate by learning about the scale of emissions from different research activities and how they measure up against other sources of emissions. We will provide practical examples of how an individual can go about quantifying emissions from different aspects of their work. Along with using existing tools for calculating emissions from their non-work activities this gives attendees a framework for empowering individuals to make changes in both their work and personal lives to have the maximum impact on eliminating and reducing carbon emissions. In addition we will use the Green DiSC framework as a guide we will explore concrete actions researchers can take and will investigate the potential carbon savings of the different actions.
Audience
This session is aimed at all people involved in research, including both researchers and those enabling researchers (e.g. researchers, RSEs and RTPs). No prior knowledge or experience is required to participate.
A laptop or smartphone will be required to take part in the interactive quiz.
Session 2.4 - Skills and competencies framework for research software and data professionals
Please note that this workshop is 30 minutes - 14:40 - 15:10 BST
Aleksandra Nenadic, University of Manchester
Dave Horsfall, Newcastle University
Eli Chadwick, University of Manchester
Aman Goel, University of Manchester
Phil Reed, University of Manchester
Adrian D’Alessandro, Imperial College London
Jonathan Cooper, UCL ARC
Abstract
The development of the RSE skills and competencies framework began as a hack-day idea at Collaborations Workshop two years ago, aiming to define skills and workforce development pathways for Research Software Engineers (RSEs). Since then, the work has continued through the RSE Competencies Toolkit group (https://github.com/RSEToolkit/rse-competencies-toolkit), focusing on three key areas - definition of the skills and competencies framework, a web app for visualising the framework and different use cases for the framework identified through various consultations with the RSE community.
The group is now exploring rebranding and expanding the framework to encompass a broader range of digital technical skills. This includes competencies related to software, data, and digital research infrastructures used in research by professionals beyond RSEs—such as data librarians, archivists, stewards, and digital research technical professionals. The expansion also aims to cover various other disciplines, including GLAM (Galleries, Libraries, Archives, and Museums), where individuals may not typically identify as RSEs. The competencies covered include not just technological skills, but also communication, leadership, and other personal and interpersonal aspects.
At the workshop, we will present the progress made so far and engage the community in gathering insights on different aspects of the framework:
- Competency-specific input: seeking expert feedback on particular skill areas (e.g., AI or HPC), ensuring accuracy, and compiling relevant training materials to develop structured learning pathways.
- Domain- or role-specific adaptation: exploring customizations for specific domains and tailoring subsets of existing skills to different software, data, or infrastructure-related roles.
- Framework-wide review: conducting a high-level evaluation to identify potential gaps and opportunities for improvement.
This collaborative effort will help refine and expand the framework, making it more inclusive and applicable across diverse research and technical roles.
Audience
No prior knowledge is required, and participation is open to all. This session follows a "show and tell" format, where attendees will review the existing skills and competencies framework, share insights on the skills they use in their daily work and research disciplines, and provide feedback to help refine the framework.
Session 2.5 - Safeguarding Research & Culture: Now the data needs us!
Please note that this workshop is 30 minutes - 15:10- 15:40 BST
Jez Cope, The British Library
Henrik Schönemann, HU Berlin
Abstract
“We must become undisciplined. The work we do now requires new modes and methods of research and teaching; new ways of entering and leaving the archives.” — Christina Sharpe, In the Wake
Our archives are vulnerable. No single archive is permanent, nor large enough to store all of our cultures at risk. Modern archival methods are robust, but no archive alone can withstand the multitude of threats we are currently facing.
The destruction of knowledge and cultural heritage has happened, and therefore it can happen again. We are in the middle of that happening, whether it is caused by human action or natural causes. However, digital information can be copied easily and quickly.
Safeguarding Research & Culture (SRC) is creating an alternative infrastructure for archiving and disseminating of cultural heritage and scientific knowledge. We seek to preserve cultural memory in a way that traditional archives cannot. Together, we can ensure that our cultural, intellectual and scientific heritage exists in multiple copies, in multiple places, and that no single entity or group of entities can make it all disappear.
Our archive is built according to the principles of FAIR and CARE, based on open technologies and standards, and resilient against loss via meaningfully distributed storage.
We focus on publicly available material, like websites, datasets and other media, that are being altered or deleted. Our collective memory manifests in different amplitudes and digital-born artefacts — from large datasets, spanning decades of research on society, to personal wikis and blogs on and by marginalised people. No matter its scope or origin, all of this knowledge was valuable to someone, somewhen, and carries with it a potential to be so again. We are equally interested in archiving ‘big’ as well as ‘small’ data.
Everyone, from individuals to institutions, can participate by accessing, contributing, and supporting these archival infrastructures. Our work also includes documenting and providing resources and knowledge to enable participation in different aspects of this endeavour.
Audience
No specific experience needed. You'll need a laptop with an internet connection, and a desire to save some data!
(Some things will be easier if you have a BitTorrent client installed.)
Session 2.6 - Contributing to The Turing Way: an open community focused on best practices in data science
- Arielle Bennett, The Alan Turing Institute
- Alexandra Araujo Alvarez, The Alan Turing Institute
- Anne Lee Steele, The Alan Turing Institute
- Carlos Martinez, Netherlands eScience Center
- Emma Karoune, The Alan Turing Institute
- Esther Plomp, University of Aruba
- Léllé Demertzi, The Alan Turing Institute
- Malvika Sharan, The Alan Turing Institute
- Kirstie Whitaker, UC Berkeley
Abstract
The Turing Way workshop will inspire you and equip you to contribute to our bi-yearly hackathon event - the Book Dash. This event is being held as part of CW25 hack day and we will host additional online contribution days (20-21 May).
The Turing Way is an open science, open collaboration, and community-driven project. We involve and support a diverse community of contributors to make data science accessible, comprehensible and effective for everyone. Our goal is to provide all the information that researchers, data scientists, software engineers, policymakers, and other practitioners in academia, industry, government and the public sector need to ensure that the projects they work on are easy to reproduce and reuse.
The Turing Way was launched in 2019, at the Collaborations workshop, as a guide to reproducibility, by providing tools, methods, and practices to address the reproducibility crisis in science. With the community's reflections on the wider issues and the need for more comprehensive skills for best practices in data science, we later added guides for project design, ethics, communication and collaboration in 2020. What began as an open-source project building a handbook for data science, has become a dynamic, global collaborative community with 450+ co-authors and 300+ chapters in our community-led handbook to reproducible, ethical and collaborative data science.
In this workshop, we will focus on the CW25 theme: Future-proofing research software: evolving together as a diverse community.
Come and find out about The Turing Way, our Book Dash, and the different ways you can contribute and start generating ideas for your own contributions.
Audience
Good for all present at CW25, especially anyone interested in learning about inclusive and reproducible data science and contributing to an open source community. Prior technical experience is not required.