New guide: Getting started with ML and AI in Research Software

As part of our new series on Artificial Intelligence in research software, we have published a new guide: Getting Started with ML and AI in Research Software, written by Paul J. Wright and reviewed by Yo Yehudi.

Getting started with ML in research software means embracing a shift in how results are produced and reproduced. Instead of a fixed execution path, research software teams work with systems whose behaviour emerges from data, configuration, and training dynamics. Reproducibility becomes a matter of capturing the process rather than relying solely on the code. The tools and techniques outlined here can be adopted incrementally into existing projects, and together they provide a practical foundation for reproducible ML research.

Aimed at researchers, research software engineers, developers and project teams beginning to use machine learning and artificial intelligence in research software, the guide introduces the shift from traditional deterministic software to experiment-focused ML systems. The guide also outlines why reproducibility, transparency and governance need to be considered differently when developing ML-enabled research software. It highlights the importance of recording the conditions under which training takes place, including data provenance, preprocessing steps, hyperparameters, evaluation metrics and model artefacts.

It also introduces key ML engineering and MLOps practices, including dataset versioning, pipeline orchestration, experiment tracking, model registries, environment reproducibility and production monitoring. Together, these approaches help make the experimental record a first-class artefact alongside the model itself.

The guide is designed to help research software teams apply the same rigour and traceability to ML systems that they already expect from other forms of scientific software.

Read the guide