Site Reliability Engineer, Backend Infrastructure, Cisco Meraki
Configuration management, monitoring, and resource allocation for large scale distributed systems.
My research centers on the effective use of distributed systems to accomplish computational science or engineering related tasks. The majority of my publications have been in two distinct areas: resource allocation and scheduling for virtualised HPC or Cloud systems, and simulation of large scale distributed systems.
In the area of resource allocation and scheduling I have authored a number of papers on Dynamic Fractional Resource Scheduling (DFRS). A key feature of this approach is that it defines and optimizes a user-centric metric of performance and fairness. The methods employed for DFRS can be applied to both service-hosting and parallel computing environments and the problem formulation supports a mix of best-effort and QoS scenarios.
In relation to simulation of large scale and distributed systems I am the originator and a current developer of SMPI, a set of compiler front-ends and compatibility libraries for the on-line simulation of MPI applications. One of the key goals of the SMPI project is that source code should compile and run properly in simulation with no or minimal modification in order to do performance prediction for large-scale computations and explore what-if scenarios. SMPI has been validated via a large set of experiments in which SMPI is compared to popular MPI implementations to assess its accuracy, scalability, and speed.
In recent years I have also become interested in topics more closely related to computational science, particularly design and automation of experiments in order to ease workload, improve reproducibility, and allow for verification of results. I am also interested in improving the management of research data objects, including software, as in many cases software and experimental results data-files are more useful for communication of scientific findings and furthering the state of the art than more traditionally recognized outputs (e.g., conference or journal publications).
Check out contributions by and mentions of Mark Stillwell on www.software.ac.uk