SWAT4LS Conference 2013

9 - 10 December 2013, Edinburgh, UK

By Laura Moss, SSI 2012 Fellow and Clinical physicist, NHS Greater Glasgow & Clyde and the University of Aberdeen

Highlights:

Interesting keynote speech by Frank van Harmelen on the potential application of semantic technologies to clinical medical domain.

A highlight of the event was to see the large number of people who are actively using and developing semantic applications for the life sciences domain and the impact that this is having on developing research in the area.

I met a number of researchers (e.g. Dr Areti Manataki (University of Edinburgh) , Dr. Alasdair Gray (Heriot Watt University) and Dr Andre Dekker(MAASTRO Knowledge Engineering) ) who are working in similar areas to my research area.

Event report:

The SWAT4LS workshop focused on the application of Semantic Web technologies and software to the life sciences research domain, covering the topics of: eHealth, biomedical informatics, systems biology, computational biology, drug discovery, bioinformatics and biocomputing. It was attended mostly by software developers working in these fields. There are many knowledge bases (or datasets) in the life sciences field which are being made available on the Web (e.g. in the Linked Open Data cloud or available through SPARQL endpoints) enabling research in the domain.

On the first day a number of interesting tutorials were held, this was then followed by a workshop day consisting of presentations, posters and demos.

Trish Whetzel (UCSD, USA) presented a tutorial on the Neuroscience Information Framework (NIF). The aim of the NIF registry is to bring together a listing of resources for researchers in the neuroscience domain. The registry currently includes over 8,000 resources such as datasets and publications and has been enabled by the use of semantic web technologies. Incoming resources are expressed against the NIF standard ontology and then web services make the resources available through the NIF web portal allowing you to search through the resources to help you find the information that you need. The NIFSTD ontology is a set of modular ontologies that bring together several external ontologies and describes concepts such as organisms, anatomy, cells, molecules, investigations..etc. The technology behind the NIF has recently been made available to help other scientific areas to build research

communities
.

Another tutorial by the EMBL-EDI (European Biomedical Informatics Institute) described how the data they collect and distribute from life science experiments has now been made available as RDF linked data. They also provide a list of RDF data services. Examples of available datasets include the UniProt dataset which consists of 10 billion triples describing the function and involvement in diseases of 118 billion protein sequences and the Reactome dataset which contains around 2000 pathways and reactions in human biology. For each dataset, a description and SPARQL endpoint is provided. Researchers can either download the RDF datasets to use locally, or they can develop an application that queries the SPARQL endpoint. They are currently in the process of developing Apps which will help researchers to further exploit this linked data.

Three keynote talks were given by experts from both academia and industry. The first keynote talk from Kerstin Forsberg (AstraZeneca) was about using and reusing existing standards for data and semantics in your software

applications. For example, she described how they converted the existing NCI Thesaurus (an extensive medical vocabulary into RDF following the SKOS standard. The second keynote talk was given by David. A. Kerr (IBM) and he talked about the use of IBM’s Watson to improve knowledge about complex medical diseases. Watson is an intelligent system which can process vast amounts of information and make logical connections. Watson is currently being applied in the cancer domain and provides support to clinicians in the form of generating alerts, accelerating the discovery of new therapies, reducing wrong diagnoses and improving cancer treatment. The system has also been used to train medical students. Finally, the third keynote was given by Frank van Harmelen on how current semantic technologies which are being applied to the life sciences domain could also be applied in the medical setting. He argued that there are lots of healthcare datasets which are not part of the linked open data cloud. However, potential useful applications for these technologies include: the automatic generation of quality indicators, creation of living medical guidelines, detection of adverse events, and designing of clinical trials; these could lead to potential funding opportunities.

Other work presented covered the following main themes: development of new ontologies and linked open datasets, integration of existing datasets, and the exploration and querying of datasets. Examples from each are described below:

New ontologies and datasets

Work by Jakub Kozak et al (Charles University, Prague) described how they have made available, as linked data, summary product characteristics of drugs. Currently these are generally only available as text but they contain a lot of valuable information. They have developed a Web app to allow people to browse this data.

The Drug Interaction Ontology (DINTO) ontology developed by Herrero-Zazo et al (University Carlos III, Madrid) was presented during the workshop. This models drug-to-drug interactions and their mechanisms.

The DisGenNET ontology describes gene-disease association concepts allowing users to find links between diseases and genes more easily.

Integration of existing datasets

The SALUS architecture enables effective integration and utilization of electronic health record data. In the application described in the demo they used it to integrate background patient information with data on adverse drug events to improve the quality collected on these events.

Exploration and querying of datasets

The Open PHACTS project is an EU funded project which is aiming to reduce barriers in drug discovery by developing a platform which integrates and semantically annotates pharmacological data from a variety of sources. It consists of 28 different partners (Academic, Pharma, and Biotech companies). The Open PHACTS Explorer was presented during the workshop and is a web application that supports drug discovery via the Open PHACTS API without requiring knowledge of SPARQL or RDF.  This enables novice users to focusing on searching the data rather than writing the queries. A number of publically available pharmacological and physiochemical data are included in the Open PHACTS Explorer.

Kumar at al described a novel framework which can discover latent semantics in high-dimensional web data; this helps to improve the quality of results returned by search engines. The tool has been tested by researchers

performing semantic searches of the PubMed database.