Reproducibility in discrete event simulation

Posted by s.aragon on 28 November 2017 - 12:05pm

By Geraint Ian Palmer, Cardiff University.

A first version of this post was originally published on Geraint's blog.

There is a reproducibility crisis in science.

In computational science this is completely avoidable: if we can tell a computer what to do, we can tell it to do it again - and we can tell other people how to tell their computer to do it again.

In the domain of discrete event simulation, however, the software and methodologies that are traditionally used can make it difficult to reproduce the simulation results. In a recent preprint I have submitted with Dr. Vincent Knight, Prof. Paul Harper and Asyl Hawa, we compare simulation approaches in terms of reproducibility and best practice, and introduce Ciw, a Python library for discrete event simulations of queueing networks that is designed to facilitate reproducibility.

The problem

The current landscape of discrete event simulation practices is largely reliant on commercial standalone software, which is often GUI-heavy. There is also a small but significant number of simulation practitioners who work using spreadsheet software, such as Microsoft Office Excel and LibreOffice Calc.

Commercial software is expensive, which limits model sharing to those who can pay, while closed-source software restricts access to source code, inhibiting model understanding and flexibility. These properties directly go against the concept of open science.

Standalone packages also tend not to be modular or extendible, hindering model re-usability. Model testing is difficult, and the use of GUIs may encourage bad model validation and verification. Binary model files make version control troublesome, and models hidden behind GUIs are not readable, with crucial parameters and behaviours hidden behind a hierarchy of menus.

A solution?

One solution to this is the Ciw library, used in a Python ecosystem.

Three properties are seen in the literature as being the minimum requirements for reproducibility in simulations: readability, modularity, and extendibility. All three properties, along with best practices, can apply to well written Python and the use of Ciw.

Ciw is open source under the MIT licence, so free to download, use, modify, and contribute. All of Ciw’s source code is available, modifiable, and extendible. It is also fully tested.

A simulation model is a script, so is testable, shareable, and version controllable, and we believe Ciw has clear, readable syntax. Seeds of random number streams are clearly set, and data collection is transparent, which means everything can be saved.

Python is renowned in the scientific community, as it has readable syntax and an abundant collection of other scientific libraries. Conducting simulations in this ecosystem would allow seamless integration with other methodologies, such as data analysis, machine learning, and other algorithms. Object-orientation, which Python is known for, and on which Ciw is built, agrees very well with discrete event simulation. This also forces modularity, and encourages extendibility.

A sample

For obtaining the average waiting time of an M/M/1 queue, with arrival rate λ=4, service rate μ=5, after simulating for 20 time units, with warm-up time two time units, over five trials:

>>> import ciw

>>> N = ciw.create_network(
...     Arrival_distributions=[['Exponential', 4.0]],
...     Service_distributions=[['Exponential', 5.0]],
...     Number_of_servers=[1]
... )

>>> average_waits = []
>>> for trial in range(5):
...     ciw.seed(trial)
...     Q = ciw.Simulation(N)
...     Q.simulate_until_max_time(20)
...     recs = Q.get_all_records()
...     waits = [r.waiting_time for r in recs if r.arrival_date >= 4]
...     average_waits.append(sum(waits) / len(waits))

>>> sum(average_waits) / len(average_waits)
0.379274988243813

Final notes

The paper includes a summary table comparing six simulation frameworks (Ciw, SimPy, SIMUL8, AnyLogic, custom scripts written in Python and C++, and spreadsheet modelling in Excel) in the criteria that we describe at the begin of this post. In addition to comparing best practices and reproducibility, we have also included performance comparisons.

Current practices in discrete event simulation do not lend themselves well to reproducibility. By introducing the Ciw library we hope to foster reproducibility of simulations.