RiboViz: Understanding protein synthesis via analysis of ribosome profiling data

Posted by m.jackson on 24 July 2019 - 10:00am The molecular structure of a yeast ribosome, composed of 79 proteins

Photo from HHMI. Credit: Marat
Yusupov, Roland Beckmann, and
Anthony Schuller.

By Mike Jackson, Software Architect and Kostas Kavoussanakis, Group Manager, EPCC, The University of Edinburgh; Edward Wallace, Sir Henry Dale Fellow, School of Biological Sciences, The University of Edinburgh

A multi-disciplinary team of biologists, bioinformaticians and research software engineers based at EPCC and The Wallace Lab at University of Edinburgh, The Shah Lab at Rutgers University and The Lareau Lab at University of California, Berkeley will enhance and extend a software suite, called RiboViz to extract biological insight from "ribosome profiling" data and drive forward understanding of protein synthesis. Consultancy from the Software Sustainability Institute was essential in developing the proposal for this project.

All cells make proteins by using molecular machines called ribosomes, which read a messenger RNA template and "translate" the RNA code into the protein code. Signals, also encoded in the RNA, control what proteins are made by cells, when they are made and in what quantities. These signals are complex and only just beginning to be understood because there are thousands of different RNA sequences in a cell and each is hundreds to thousands of nucleotides ("letters") long. Recent advances in DNA and RNA sequencing technology mean that we can now measure all the subsequences of RNA that are translated into protein and the quantity of protein produced by using a technique called ribosome profiling. Although this technique is amazing, it is not perfect, and statistical tools are needed to separate the interesting biological signals in the data from unwanted biases of the experimental measurement. These tools need to be implemented in usable and reliable software in order for all scientists studying protein synthesis to be able to get the maximum possible information from ribosome profiling data, which is expensive and time-consuming to collect.

The RiboViz software suite, written in Python and R, takes raw data from sequencing machines and passes it through a series of processing steps. RiboViz estimates how much each part of RNA is translated, and how the amount of translation is controlled by the code of that RNA. RiboViz produces tables, figures and graphs that can be published online, in a form useful for both experts and non-experts. Sharing data in this way can help to make science both more reproducible and more accessible. In this spirit, RiboViz itself is open source software, hosted on GitHub, and free to use by anyone in the world.

The first iteration of RiboViz was developed by Premal Shah and Tongji Xing of Rutgers University and Oana Carja and Joshua Plotkin of the University of Pennsylvania. Edward Wallace at the Institute of Cell Biology, School of Biological Sciences, University of Edinburgh and Premal have developed successive versions.

For our current project, Kostas Kavoussanakis and Mike Jackson of EPCC will work with both Edward and Felicity Anderson, at the University of Edinburgh, Premal, and Liana Lareau of University of California, Berkeley. We will both make the RiboViz code more reliable, easy to use, and future-proof, and add features that quantify protein synthesis more accurately. We will develop statistical models that take account of both biological signals and unwanted biases. We will apply these to understand some interesting features of how protein synthesis is regulated. The first is how production of a short ("upstream") protein from an RNA can control production of another protein later ("downstream") on the same RNA. The second is to understand how synonymous parts of the RNA code affect how ribosomes move and how much protein they produce.

Our work will help to develop fundamental knowledge about how cells work, and has several applications. Companies who genetically engineer cells to express proteins, for example to make therapeutic drugs or artificial silk, will have better tools to engineer those cells to produce the right amount of protein at the right time. Scientists studying evolution will have better tools to understand how coding sequences evolve, allowing deeper understanding of the tree of life. Lastly, we will be able to better understand human genetic diseases caused by defects in protein synthesis, which in the long run could lead to better treatments.

Our collaboration is funded by the BBSRC in the UK and the NSF/BIO in the USA as a BBSRC-NSF/BIO Lead Agency collaboration. Essential in developing our proposal to BBSRC and NSF/BIO, was input from The Software Sustainability Institute. The Institute provides advice and guidance to researchers on all aspects of the use, development and funding of software within research. In 2018, the Institute completed a software and sustainability review of RiboViz, which included recommendations as to how the software and its supporting documentation and resources could be improved and a development plan which formed the basis for our proposal.

Our collaboration started in May 2019 and runs until April 2022. We look forward to reporting on our progress.