C4RR Demo

Abstracts of demos sorted by last name of the first author.

The demos will take place in the morning of the second day, 28 June, between 9:00 and 12:00.

RosettaHUB, connecting the dots between clouds, containers and research software

Karim Chine, RosettaHUB Ltd.

The RosettaHUB platform connects the dots between clouds, containers, research software, real-time collaboration frameworks and social portals. It delivers a virtual environment of considerable flexibility and power that fosters usability, reproducibility, shareability and auditability at all layers of interactions between scientists and the research tools and infrastructures.

The workshop will give an overview of the new platform and hub for open data science and will highlight the essential role played by docker in this new ecosystem.

RosettaHUB makes public and private clouds easy to use by everyone. RosettaHUB's federation platform allows higher education institutions and research laboratories to create virtual organizations within the hub. Members receive active AWS accounts supervised in terms of budget and cloud resources usage, protected and monitored/managed centrally by the institution’s administrator. 

RosettaHUB allows users to work with docker containers seamlessly. Simple web interfaces allow users to create those containers, connect them to data storages, snapshot them, share snapshots with collaborators and migrate them from one cloud to another. The RosettaHUB perspectives make it possible to use the containers to serve securely noVNC, RStudio, Jupyter, Zeppelin and Spark-notebook, Shiny Apps and to enable those tools for real-time collaboration.

The RosettaHUB real-time collaborative containerized workbench is a universal IDE for data scientists which break the silos between data science environments. The IDE makes it possible to interact with containerized hybrid kernels gluing together in a single process R, Python, Scala, SQL clients, Java, Matlab, Mathematica, etc. and allowing those different environments to share their workspaces. A user friendly reactive programming framework makes it possible to create reactive data science microservices and interactive web applications. Containers, workbench checkpointing and logging of all the interactions the users have with their environments make everything created within RosettaHUB reproducible and auditable.

The RosettaHUB’s APIs (700+ functions) cover the full spectrum of programmatic interaction between users and clouds, containers and R/Python/Scala kernels. Based on those APIs, RosettaHUB provides a CloudFormation-like service which makes it easy to create and manage as templates, collections of related Cloud resources, container images, R/Python/Scala scripts, macros and visual widgets alongside with optional cloud credentials. Those templates are cloud agnostic and they make it possible for anyone to easily create and distribute complex data science applications and services. RosettaHUB's marketplace transform those templates into products that can be shared or sold.

Video.

Requirements

Laptop with internet connection available.

Container adoption & reproducibility - the Training Trojan horse

Mark Fernandes, Quadram Institute Biosciences.

The main premise of this workshop is to demonstrate that by using training environments which exploit Container technologies and reproducibility best practices we can have a golden opportunity to influence awareness and adoption of them. Learners attending courses have an objective of increasing their knowledge and are ready to engage with new approaches and ideas. If they can see obvious benefits to the methodology being used to facilitate this goal, they may be more open to applying it to their own research methodologies. By containerising the training materials, we demonstrate the potential of worldwide deployment of a computational environment that is host-agnostic thereby dramatically increasing the available audience who can access, run and evaluate environments such as Bioinformatics pipelines. This is visible in the way the materials can be run under different Operating Systems. Publications today are augmented by publishing research data. Container technologies can provide a means to publish the analytical methods used in a reproducible manner. Training can emphasis this by showing material deployment from repositories such as GitHub & Docker Hub.

Some research software can be problematical to install and configure without experts. Learners will begin to see the containers as a form of commoditisation of 'ready-to-eat' software. Trainers will be interested in the benefits of maintaining the training environments and the reduction of setup overheads i.e. next time the container is run it is in a known state.

Examples of Bioinformatics training courses implemented in container environments will be presented and delegates will be encouraged to assess the "behind the curtain" message.

Slidesvideo.

Requirements

Laptop with Docker (if you use Linux) or Kitematic (if you use Windows or macOS) installed.