• Aucun résultat trouvé

A Poster about a New Graph-based Taxonomy RESTful Data Service

N/A
N/A
Protected

Academic year: 2022

Partager "A Poster about a New Graph-based Taxonomy RESTful Data Service"

Copied!
2
0
0

Texte intégral

(1)

A graph-based taxonomy RESTful data service

Leyla Garcia1*, Leonardo Gonzales1*, Maria Martin1

1 Protein Function Development Group, EMBL-EBI, Wellcome Genome Campus, CB10 1 SD, Hinxton, UK

* These authors have equally contributed to this work {ljgarcia, lgonzales, martin}@ebi.ac.uk

Abstract. In Life Sciences, taxonomy refers to a standard nomenclature and clas- sification system for living organisms. In early days, organisms were clustered together mainly based on shared morphological characteristics. Nowadays, evo- lutionary patterns linking organisms have become more relevant when classify- ing organisms. Applications and tools serving taxonomical data aim to facilitate navigation. Here we present a graph-based RESTful data service built on top of Neo4J. Our service supports common queries as well as advanced queries includ- ing common ancestor between two or more taxonomic identifiers, a path to an- cestors or descendants within a specified level, and the shortest path between two taxa. Availability: http://www.ebi.ac.uk/proteins/api/doc/index.html

Keywords: Graph-based databases, taxonomy, web data services.

1 Background

Taxonomical data services aim to facilitate navigation through the hierarchy. They commonly support queries based on taxon identifier as well as scientific and common names. Responses include taxon identifier, scientific and common names, taxonomical rank, e.g., genus, order or species, and lineage. For instance, the National Center for Biotechnology Information (NCBI) provides taxonomy data services via e-utils tools, supporting queries based on one or multiple taxonomical identifiers as well as scientific and common names. Both Extended Mark-up Language (XML) and JavaScript Object Notation (JSON) formats are supported as well as File Transfer Protocol (FTP) down- loads and end-user interfaces.

The European Nucleotide Archive (ENA) provides a JSON-only taxonomy service, supporting queries by taxon identifier and name. The name-based query can be user to retrieve a list of suggested taxa based on the first characters. The Universal Protein Resource (UniProt) serves taxonomical data as part of its Resource Description Frame- work (RDF) distribution, accessible via its SPARQL Protocol and RDF Query Lan- guage (SPARQL) endpoint. SPARQL flexibility together with SPARQL property paths make it possible to create queries such as getting a path forwards or backwards from a starting node. Both ENA and UniProt support FTP downloads and end-user interfaces.

(2)

Desirable features when traversing a taxonomical tree such as finding the shortest path from one taxa to another or the common ancestor for multiple taxa are not com- monly supported by traditional taxonomical data services. Here we present a graph- based taxonomy RESTful data service supporting not only common queries but also those related to taxonomical paths.

2 Our Neo4J-based RESTful taxonomy Service

Here we present a graph-based RESTful data service built on top of Neo4J. Our service supports common queries –those related to taxonomic identifiers and scientific or com- mon names, as well as advanced queries. In addition to the basic data, i.e., taxon iden- tifier, scientific names, common names and lineage, we also provide links to the related taxa. Advanced queries include the common ancestor between two or more taxonomic identifiers, a path to ancestors or descendants within a specified level, and the shortest path between two taxa. Such advanced queries might also be accomplished by using SPARQL; however, they can turn out to be complex and long queries. In our service, complexity is hidden to service users. Additionally, stress and load tests have been per- formed on our service to ensure its reliability and robustness. Both XML and JSON responses are supported. The full documentation is available at http://www.ebi.ac.uk/proteins/api/doc/swagger/index.html?moduleId=taxonomyApi.

Some query samples are provided here:

 Full data by taxon identifier, e.g., http://www.ebi.ac.uk/proteins/api/taxon- omy/id/9606

 Partial data by taxon identifier, i.e., basic information, children, parent or siblings, e.g., http://www.ebi.ac.uk/proteins/api/taxonomy/id/9606/node

 Lineage by taxon identifier, e.g., http://www.ebi.ac.uk/proteins/api/taxonomy/line- age/9606

 Full data by scientific, common and mnemonic names, e.g., http://www.ebi.ac.uk/proteins/api/taxonomy/name/homo?search-

Type=STARTSWITH&fieldName=SCIENTIFICNAME, useful for autocomplete

 All taxa related with the queried taxon identifier in a specific direction and depth level, e.g., http://www.ebi.ac.uk/proteins/api/taxon- omy/path?id=8782&depth=3&direction=BOTTOM

 Path between two taxa showing their relationship, e.g., http://www.ebi.ac.uk/pro- teins/api/taxonomy/relationship?from=8782&to=9606

 Lowest common ancestor of two or more taxa, e.g., http://www.ebi.ac.uk/pro- teins/api/taxonomy/ancestor/8782,9606

Our graph-based RESTful taxonomy data service is a modern and robust service sup- porting traditional queries as well as some new features that make it easier to find rela- tionships across taxa. This service is part of the set of protein data web services deliv- ered by the Protein Function Development Group at the European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI). We are exploring the creation of a library of web components using and showcasing our web services.

Références

Documents relatifs

Ce chef incontesté n'est pas venu au monde que pour faire de la musique, c'est vrai, mais celle-ci s'est emparée de lui et l'a entraîné vers des horizons qui font que ce

We selected these mutants to probe the consequences of (i) deleting the native Mn(II) ligands (AAA, AHA), (ii) including other His residues that provide high-affinity

Polystores. More recently, the research focus shifted towards query processing against heterogeneous data sources. Differ- ent approaches have been proposed on how to store,

Given a graph and a goal query, we start with an empty sample and we continuously select a node that we ask the user to label, until the learned query selects exactly the same set

This observation exposes the link between high-level cognitive activity and the changes in dynamics of brain activity implying that top-down effects on conscious mental events play

Nous émettrons trois hypothèses à vérifier par la suite : la première supposera que le droit à la santé des MNA et à l’égalité de traitement est respecté dans le canton de

It would be possible to describe how “the contemporary dance of the 1980s” developed on the basis of a common discourse, a common language (an envelope) in which artists,

This paradigm can be applied to different database models (e.g., relational, semi-structured, graph) and a class of queries adequate to the database model (e.g., join queries for