Community

Cloud computing has become a very popular paradigm in computing in general and increasingly in the more demanding field of scientific computing. The RCUK Cloud Working Group has over the last 2 years initiated a series of community events and discussions to help researchers access and exploit cloud computing for their work. One early realisation is the need for practical advice as to how build an application that can be deployed across multiple clouds.

As part of the RCUK Cloud Working Group's series of community events that has been organised over the last two years, EMBL-EBI are pleased to offer  an open workshop and training session 'ResOps: Delivering Science Across Clouds'. The event will take place on Monday 3rd July 2017. The day will provide some basic domain neutral practical experience on working with clouds from the work EMBL-EBI has been doing over the last year.

Attendees should be comfortable with technical scientific computing concepts and operating in and administering a linux environment (i.e. ssh, shell scripts, etc) and will learn about using DevOps technologies (e.g. Ansible, Terraform, Puppet, Docker) to deploy onto OpenStack.

To register, please visit the event page.

Privacy and Trust in IoT & Open DataBy Sinan Shi, University College London, David De Roure, University of Oxford, Nikoleta Glynatsi, Cardiff University, Emma Tattershall, Science and Technology Facilities Council, Andrew Landells, University of Southampton, Chris Gutteridge, University of Southampton, Gary Leeming, University of Manchester.

This post is part of the Collaborations Workshops 2017 speed blogging series.

Challenges of understanding risks of privacy within a socially connected infrastructure are not well understood and constantly changing. Personal information can be private but still be accidentally shared by others and made available more widely. One of the largest challenges for privacy is the lack of understanding of what that data could be used for now, and as more data are collected and made available future purposes become even more difficult to predict. Often, seemingly innocuous data sets can be used to derive more private data, such as the waking times and other habits of…

Continue Reading

Sharing code and data neuroscienceBy Stephen Eglen, Software Sustainability Institute's fellow, University of Cambridge.

Scientists are increasingly dependent on computational techniques to analyse large volumes of data. These computational methods are often tailored to the particular analysis in mind, and as such are valuable research outputs. Furthermore, unlike experimental techniques, computational methods can be easily shared. However, at least in neuroscience, computational methods are not routinely shared upon publication of associated manuscripts.

To improve this situation, we have worked with the editors of Nature Neuroscience to establish a pilot code review project. Once papers have been approved in principle for publication, authors can opt-in to the code review. The code (and data) will be checked to see if independent reviewers can reproduce key findings of the paper. The details of the code review process are outlined in theeditorial, and we have written a commentary to describe good practice for sharing of code and data. For example, we suggest the minimum requirement for sharing is that sufficient code and data be provided to regenerate a key figure/table of the paper. This follows the well-established requirements for…

Continue Reading

Arab poetry, use of SolrBy Swithun Crowe, Research Computing, University Library, University of St Andrews

This article is part of our series: A day in the software life, in which researchers from all disciplines discuss the tools that make their or someone else’s research possible.

Most of the data I work with is in XML format—Text Encoding Initiative (TEI)— either handwritten or edited using XForms: XML exported from other programs such as Zotero, or taken from third party web services, such as the Library of Congress authority files. To search these files, I use Apache's Solr document search engine, usually communicating with it via PHP's Solr extension. The source XML documents are transformed into a form which Solr can ingest using XSLT. 

The examples in this…

Continue Reading

Docker Containers & Reproducible ResearchBy Raniere Silva, Community Officer.

Docker Containers for Reproducible Research Workshop (C4RR) is only a month away, 28-29th of June 2017 at the University of Cambridge. This workshop offers many talks on the use of containers applied to improve reproducibility on desktop, cloud and HPC environments and some practical sessions.

For those interested in HPC, some talks will surely make the workshop worth for all our attendees, Michael Bauer's one about Singularity, Matthew Hartley's one about ways to make the transition from the desktop to the HPC smother and Jeroen Schot's one describing how the Dutch National e-Infrastructure is empowering containers.

Meanwhile, the talks from Nick James, David Mawdsley and Matthew Upson are aimed at attendees who are more interested in reproducibility. Nick will talk about an open source data analysis pipeline from the European Bioinformatics Institute that employs containers. If you are an R user and are looking for ways to use Knitr with Docker to make easy for your colleagues to reproduce your R Markdown documents, David's talk is for you. And Matthew will take the attendees through a journey…

Continue Reading

This procedure has been adopted from the Ada Initiative's guide titled "workshop anti-harassment/Responding to Reports”.

  1. Contact any of the following staff

The staff will also be prepared to handle the incident. All of our staff members are informed of the code of conduct policy and guide for handling harassment at the workshop. There will be a mandatory staff meeting just prior to the workshop when this will be reiterated as well.

  1. Report the harassment incident (preferably in writing, e.g. on paper or via email). All reports are confidential.

  2. When reporting the event to staff, try to gather as much information as available, but do not interview people about the incident. Staff will assist you in writing the report/collecting information.

  3. The important information consists of:

    • Identifying information (name/badge number) of the participant doing the harassing

    • The behavior that was in violation

    • The approximate time of the behavior (if different than the time the report was made)

    • The circumstances surrounding the incident

    • Other people involved in the incident

Continue Reading

The Docker Containers for Reproducible Research Workshop brings together researchers, developers and educators to explore best practices when using containers, not only Docker, and the future of research software with containers.

We value the participation of each stakeholder and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the workshop and at all workshop events, including online.

To make clear what is expected, all attendees, speakers, exhibitors, organisers and volunteers at Docker Containers for Reproducible Research Workshop 2017 are required to conform to the following Code of Conduct. Organisers will enforce this code throughout the workshop.

Summary

Docker Containers for Reproducible Research Workshop is dedicated to providing a harassment-free workshop experience for everyone. We do not tolerate harassment of workshop participants in any form.

All communication should be appropriate for a professional audience including people of many different backgrounds.

Be kind to others. Do not insult or put down other attendees.

Behave professionally. Remember that harassment and exclusionary jokes are not appropriate at  the Docker Containers for Reproducible Research Workshop.

Attendees violating these rules may be asked to leave the workshop without a refund at the sole discretion of the workshop organisers.

Thank you for helping make this a welcoming, friendly event for all.

Clarifications

Harassment includes…

Continue Reading

Abstracts of lightning talks sorted by last name of the first author.

The lightning talks will take place in the afternoon of the second day, 28 June, starting at 15:00.


Research Software Engineer

Joshua Heimbach, University of Cambridge, Department of Genetics

InterMine is an open source biological data warehouse and we've just starting using docker. We're interested in making InterMine easier to install for novice users, and are using Docker to manage software dependencies.


Containing infrastructure models for testing and scale

Tom Russell, Oxford University, Environmental Change Institute, ITRC Mistral project

NISMOD (National Infrastructure Systems MODel) is a system-of-systems model which couples simulation models of the energy, water, waste water, transport and solid waste systems in order to evaluate long-term plans, risk and resilience under socioeconomic and climate scenarios.

Motivated in part by a scenario and decision space which looks amenable to an embarrassingly parallel scale-out, and in part by the need to isolate and test components, we are migrating models to build and test in containers under continuous integration and we are currently prototyping the model-running architecture which should take advantage of the opportunities to scale out.

Abstracts of demos sorted by last name of the first author.

The demos will take place in the morning of the second day, 28 June, between 9:00 and 12:00.


RosettaHUB, connecting the dots between clouds, containers and research software

Karim Chine, RosettaHUB Ltd.

The RosettaHUB platform connects the dots between clouds, containers, research software, real-time collaboration frameworks and social portals. It delivers a virtual environment of considerable flexibility and power that fosters usability, reproducibility, shareability and auditability at all layers of interactions between scientists and the research tools and infrastructures.

The workshop will give an overview of the new platform and hub for open data science and will highlight the essential role played by docker in this new ecosystem.

RosettaHUB makes public and private clouds easy to use by everyone. RosettaHUB's federation platform allows higher education institutions and research laboratories to create virtual organizations within the hub. Members receive active AWS accounts supervised in terms of budget and cloud resources usage, protected and monitored/managed centrally by the institution’s administrator. 

RosettaHUB allows users to work with docker containers seamlessly. Simple web interfaces allow users to create those containers, connect them to data storages, snapshot them, share snapshots with collaborators and migrate them from one cloud to another. The RosettaHUB perspectives make it possible to use the containers to serve securely noVNC, RStudio, Jupyter, Zeppelin and Spark-notebook, Shiny Apps and to enable those tools for real-time collaboration.

The RosettaHUB…

Continue Reading

Software in engineering By Edward Smith, Institute’s fellow, Imperial College London.

As an engineer, software design concepts are not only familiar, they are central to the education we are forced to endure. These include standardisation, quality testing and the importance of outlining a clear specification. However, when it comes to software development, engineering academics seem to forget these principles; principles that shaped the industrial revolution and allowed us to engineer the modern world. In this short blog, I want to explore why academic engineers don't apply these best-practice concepts to software.

It is clear we are in the middle of a revolution; one which arguably will change the world more rapidly than the industrial revolution over a hundred years ago. Aside from the scientific developments, among them steam power, electricity and mastery of materials such as iron and steel, it was the methodologies forged during this period that were pivotal to the revolution. The key concept for mass production was the division of labour and automation, allowing much greater production by fewer people. In addition, the standardisation of parts allowed each person to specialise and optimise a given part fitted together by agreeing on the required interface between these parts.

Consider a car. Before the industrial revolution, a…

Continue Reading
Subscribe to Community