Skip to main content Site map
HomeNews and blogs hub

Flipping the script on machine learning – data first, models later

Bookmark this page Bookmarked

Flipping the script on machine learning – data first, models later

Author(s)
Dylan Adlard Profile Picture

Dylan Adlard

SSI fellow

Posted on 23 October 2025

Estimated read time: 4 min
Sections in this article
Share on blog/article:
LinkedIn

Flipping the script on machine learning – data first, models later

A brain and connections

Applied machine learning is traditionally taught using either a model-first approach, whereby the focus begins with architectures and training pipelines, or a problem-first approach, where students start with a task and explore how ML can solve it. However, with the rise of powerful off-the-shelf tools and pretrained models, the technicalities of "doing machine learning" have become relatively straightforward. For many real-world problems in the applied sciences, modelling is no longer the bottleneck; it’s understanding the data, biases, and limitations that really matters. This is especially true for clinical diagnostics, where data is often scarce, from diverse geographies, collected using non-standardised methods, and where the signal can be subtle.

As part of the Diagnostics for Lower Middle-Income Countries (Dx4LMIC) Conference, and in collaboration with the George B. Moody Physionet Challenges, I led a hands-on machine learning workshop for graduate students, with no coding or prior ML experience required. In fact, not a single line of code was written nor model architecture described. Instead, we introduced machine learning through three key data-centric considerations: 

  1. What good performance actually means
  2. Why data quality matters
  3. How hidden biases can dictate real-world utility. 

Each theme was initiated through research-focused keynote talks by Professor Antonio Ribeiro (Federal University of Minas Gerais), Professor Matthew Reyna (Emory University) and Dr. Alissa Hummer (Stanford University).

The workshop was framed around a real and complex challenge: diagnosing Chagas disease directly from ECG data. Attendees used an annotated, interactive Google Colab notebook that trained and evaluated a model on real clinical 12-lead ECGs and demonstrated the impact of the core themes by subsetting the data and retraining the model in real time. However, inspired by the Software Sustainability Institute’s collaborative events, much of the workshop was spent in breakout discussion groups with 5 experienced ML practitioners, exploring how each theme applies to participants’ own research domains.

The confidence with which attendees could engage in nuanced discussions, particularly around challenging topics like bias, demonstrates that core applied ML concepts can be effectively taught even in the absence of any modelling. This approach is harder to teach at first, but as long as attendees understand that what you put into the model shapes what you get out, it turns ML into a logic problem and helps them focus on what really matters - the data.

The workshop turned out to be a success, with positive feedback from attendees. Many said the sessions gave them a clearer understanding of how to approach ML in their own research, especially when working with messy or limited datasets. The collaborative discussions were a highlight, with several participants noting how valuable it was to slow down and focus purely on the problem, without pressure to code or troubleshoot. For many, it was the first time they had critically engaged with issues like bias, data quality, or performance beyond accuracy, a shift in perspective we were hoping to achieve. Discussions around next year’s workshop are already underway.

A huge thank you to Professor Gari Clifford and Professor Matthew Reyna, who provided the clean data and model and were invaluable sounding boards throughout. I'm also grateful to Reuben College for hosting the workshop under their banner, and to the Software Sustainability Institute, of which I am a fellow, for their generous funding that made the event possible. 

 

 

 

Back to Top Button Back to top