The Minimal Reproducible Example Paradox

puzzleBy Yihui Xie. With an introduction by Raniere Silva, Software Sustainability Institute.

Here at the Institute, we talk a lot about best practices when developing software but we rarely talk about best practices for issue/bug reports. Probably one of the most important best practices to follow when reporting an issue/bug is to provide minimal example. Yihui Xie, author of various R packages, wrote the blog post "The Minimal Reproducible Example Paradox" on his blog, which is a reflection of the importance of users dedicate some time to develop a minimal example. With Yihui's permission, we’ve reproduced his post below.

You can access the original post at Yihui Xie's blog.


How many times have I reminded a user of posting a minimal, self-contained, and reproducible example (reprex)? Probably 500 times. How many times do I think I will still need to remind users of this? Perhaps 5000 times.

I think there is a paradox, and there isn’t a clever solution. The paradox is that we software developers know much more about our own software than average users (especially beginners), and we naturally anticipate a reprex, but users are in the dark, not knowing what information to provide to help solve the problem. Sometimes it is obvious to know what is wrong without an example (e.g., by reading the error message), but this is relatively rare. When users report a problem, we need them to show the problem is real, and the most convincing way is to provide a reprex.

Innocent users often report a problem without telling developers how to reproduce it, and that is probably because they think the developers are smart enough to figure out everything. This trust is similar to how we trust doctors when we are sick, but we don’t just go to see doctors and tell them we are sick, then immediately ask for a medicine or treatment. Doctors have to ask several questions, so do software developers. The problems is that asking questions is a natural part of the job of doctors, however, for software developers, it is easy to lose patience when you have to repeat this 500 times to those who you offer free help (see note 1):

“Please provide a minimal, self-contained, and reproducible example.”

The particularly difficult requirement for users is “minimal”. It takes time to prepare a minimal example. In most cases, users run into software issues while working on real-world problems. I have seen a lot of bug reports with reproducible examples, but alas, these examples can contain thousands of lines of code (e.g., a giant Shiny app or R Markdown report). It is impractical for me to try to understand all the data and models, plus 50 add-on packages, before I can start investigating the issue.

look-hard.gif

I think it is much easier for developers to minimize the examples than users, because they often have the intuition about those lines of code that cannot possibly be the culprit. For example, I’d remove as many add-on packages in the example as possible (such as library(ggplot2)), and replace real data with simpler built-in datasets (such as iris or mtcars, or even simulated data), and see if the problem persists. For an R Markdown document, I’d delete as much content as possible (e.g., perhaps leave only one short paragraph). The more irrelevant content I have to look at, the slower it is for me to figure out the root issue. Deleting irrelevant content or code is something users can do and can also be tremendously helpful, but the problems are (1) they do not have the good intuition as developers; (2) they probably do not have time; (3) they may not even know it is so helpful to minimize a reprex.

shovel snow

When a user is able to create a minimal reprex, chances are that she is really close to the root problem (perhaps only one step away), so that she may actually be able to solve the problem by herself!

Lastly I want to show a few examples of minimal reproducible examples:

  • Bugs I reported to Pandoc on Github. These bugs are often discovered when other users report issues in the rmarkdown repo. After I investigate them, I try to minimize the examples to show the problems are only relevant to Pandoc (i.e., rmarkdown is innocent). My minimal reprex typically only contains a couple of lines of text.

  • rstudio/blogdown#197: this is a recent issue, and Justin told me carefully how to reproduce the issue. I like this type of issues, because I don’t need to ask more questions: just follow his steps, and I’m able to reproduce the issue.

In conclusion, it is helpful for a developer to receive a minimal reprex, but it may not be a trivial thing to do for beginners. I’m well prepared to repeat the boring sentence for another 5000 times. Let’s keep educating users the importance of a reprex together. Oh, I almost forgot to mention the “When in doubt, upgrade” strategy. Your issue may be automatically solved by simply upgrading.

Notes

1. Free as in beer. I don’t think it is completely “free”. For example, developers can often benefit from answering questions. Sometimes they reveal bugs, and sometimes they help developers improve their software. That is, developers do get something in return in this process.

Posted by s.aragon on 26 September 2017 - 4:00pm

Add new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.