• Aucun résultat trouvé

Linked Data Services for Scientists Exemplified Using Drosophila Melanogaster Datasets

N/A
N/A
Protected

Academic year: 2022

Partager "Linked Data Services for Scientists Exemplified Using Drosophila Melanogaster Datasets"

Copied!
2
0
0

Texte intégral

(1)

Linked Data services for Scientists exemplified using Drosophila melanogaster datasets

M.P. van Iersel, N. Anwar, L. Reynolds General Bioinformatics, Reading, UK

1 Introduction

Semantic technologies allow scientists to rapidly integrate data from experimental datasets and online databases. On the computational side, the massive scale of these datasets poses some challenges. We find that there are additional challenges on the human side. Problems we encounter in practice include translating human questions to SPARQL queries, adjusting data models and ontologies to match the problem space of scientists, visualising data, and providing user-friendly access.

To address these issues, General Bioinformatics applies the following principles.

Open source tools such as PathVisio and Cytoscape are adopted to provide visualisation options. Custom inferences layers are added to semantic data to address research questions. For example, we apply inferences on top of BioPAX pathways to enable the extraction of a bipartite biochemical reaction network with a single SPARQL query. Finally, we provide training materials for scientists for visualisation tools, ontologies, and SPARQL.

Our linked data system is supports life science researchers to answer research questions. For example, the system can gather supporting evidence from a variety of databases to identify interesting target proteins. Our public demo focuses on Drosophila, but the same principles apply to any model organism.

2 Results

We have integrated several Drosophila melanogaster datasets and built a public showcase. This showcase integrates diverse data types such as genes, proteins, FlyAtlas expression data and BioPAX pathways. This data is stored in a triple store running in a cloud instance. We provide several options for visualizing the data. First, a custom Cytoscape plugin can extract triple data and visualize it in network form.

Secondly, a PathVisio plugin can be used to create visualisations on top of pathway diagrams derived from WikiPathways (See Fig. 1).

(2)

By following the demo, you should get a clear idea how linked data can be used to integrate diverse Drosophila datasets, how this can be visualized in pathway and network context, and how this can help to answer research questions.

Fig. 1: a schematic overview of the integrated Drosophila data. A collection of databases (top) have been transformed into RDF and integrated using standard ontologies (middle). Two different visualisation tools are provided to create different

views oft he data.

3 Availability

The integrated data, SPARQL tutorial and videos, RDF downloads, cytoscape session and live SPARQL endpoint are all available at http://fly.cloud.generalbioinformatics.com/

Références

Documents relatifs

In this paper we present Loupe, an online tool 3 for inspecting the structure of datasets by analyzing both the implicit data patterns (i.e., how the different vocabularies are

This demo presents the dataset profile explorer which provides a re- source type-specific view on categories associated with available datasets in the Linked Data cloud, in

Time evolution of proteolysis index has been measured and modelled on five different types of pork muscle under different accurate temperature and water activity

We show that our model using transitions and inspired by Q-learning performs more efficiently than traditional actor-critic models of the basal ganglia based on temporal difference

A-Box only results Table 3 gives the analysis results by only considering A-Box level information. Both syntactic and semantic redundancies are analysed by EDP based ap- proach. In

There is a SPARQL endpoint and downloadable RDF of the integrated data, a Cytoscape session containing the network visualisation, and a collection of pathway visualisations created

In this work, we evaluate the usage of rdfs:label as a property referring to entities’ name and we describe a set of heuristics created to discover properties identifying names

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des