• Aucun résultat trouvé

SCRY: Enabling Quantitative Reasoning in SPARQL Queries

N/A
N/A
Protected

Academic year: 2022

Partager "SCRY: Enabling Quantitative Reasoning in SPARQL Queries"

Copied!
2
0
0

Texte intégral

(1)

SCRY: Enabling quantitative reasoning in SPARQL queries

Bas Stringer1,4, Albert Meroño-Peñuela2,3, Antonis Loizou2, Sanne Abeln1, and Jaap Heringa1

1 Centre for Integrative Bioinformatics, VU University Amsterdam, NL

2 Knowledge Representation and Reasoning Group, VU University Amsterdam, NL

3 Data Archiving and Networked Services, KNAW, NL

4 To whom correspondence should be adressed ([email protected])

The inability to include quantitative reasoning in SPARQL queries slows down the application of Semantic Web technology in the life sciences.SCRY, ourSPARQL compatible service layer, improves this by exe- cuting services at query time and making their outputs query-accessible, gener- ating RDF data on demand. The power of this approach is demonstrated with two use cases, where we use SCRY to calculate standard deviations and to find homologous proteins.

More and more biological knowledge is being made available through the Seman- tic Web as Linked Open Data. However, not all knowledge is easily captured or stored in static triple repositories. This is especially true for knowledge derived from quantitative reasoning.

For example, BLAST is commonly used to predict homology, but its input parameters and interpretation of its output vary greatly depending on the re- search question. Precomputing the outcomes of all parameter combinations and every conceivable input is practically impossible. Thus, using BLAST results in a SPARQL query, for example to look up information about a protein’s homologs, requires these results to be generated, analysed and incorporated at query time.

No currently available tools facilitate this effectively.

Likewise, the standard deviation of a set of measurements can be used to detect outliers. Thus, if an observation is classified as an outlier depends on the other observations in the set. Many triplestores contain data which a simple query can group into sets, but SPARQL does not support a straightforward way to calculate their standard deviation. This makes it difficult to use outlier selection in SPARQL queries.

More generally, many research questions require some form of quantitative reasoning. Such reasoning is currently poorly supported within SPARQL queries and is often infeasible to capture through precalculation; this hinders the appli- cation of Semantic Web technology to research questions in the life sciences.

We present SCRY, theSPARQLcompatible service layer, which is a lightweight SPARQL endpoint that interprets parts of basic graph patterns as calls to user defined services. These services are evaluated at query time, generating triples needed to resolve the query on the fly.

(2)

SCRY allows users to incorporate algorithms of arbitrary complexity within standards-compliant SPARQL queries, and to use the generated outputs directly within these same queries. Unlike traditional SPARQL endpoints, the RDF graph against which SCRY resolves its queries is generated at query time, by executing services encoded in the query’s graph patterns. Figure 1 illustrates the dataflow of a typical query using SCRY, where its services are accessed through federated queries from a primary endpoint.

Fig. 1.Dataflow diagram of a typical SPARQL query using SCRY, through federated queries from a primary endpoint. The primary endpoint will send a federated query to SCRY, optionally interrogating a persistent RDF graph to retrieve inputs for services.

SCRY will parse service calls from the federated query’s basic graph patterns, and generate an RDF graph by executing them. It then resolves the federated query like a traditional endpoint, and returns its results to the primary endpoint.

The federation-oriented design allows for easy integration with existing SPARQL endpoints and tools. Moreover, SCRY facilitates the execution of arbitrary ser- vices within SPARQL queries, which enables quantitative reasoning as well as other forms of data processing and analysis. Ultimately, this allows Semantic Web technologies to be more effectively applied to research questions in the life sciences.

We demonstrate the power of SCRY’s on-the-fly triple generation with two use cases. First, we select outliers among GO-annotated proteins from the UniProt SPARQL endpoint, using SCRY to calculate the standard deviation of their se- quence lengths. Second, we combine SCRY, BLAST and the Human Protein Atlas to find homologous proteins which are expressed in the same tissues as a query protein.

2

Références

Documents relatifs

We use SPARQL conjunctive queries as views that describe sources contents. As in the Bucket Algorithm [20], relevant sources are selected per query subgoal, i.e., triple pattern.

BASIL (Builging Apis SImpLy) is a cloud platform that allows data consumers to tailor their own Web APIs by specifying queries on SPARQL endpoints.. Tailored Web APIs and related

Abstract. We present an idea of efficient query answering over an RDF dataset employing a consumer-grade graphic card for an efficient com- putation. We consider tree-shaped

The semantics of SPARQL under entailment regimes is specified for the conjunctive fragment, where queries are represented as basic graph patterns (i.e., sets of RDF triples

Third, we build upon our translation algorithm to develop a rewriting optimization that converts graph traversal queries into equivalent SPARQL queries that execute over RDF graphs

Abstract: SPARQL query containment under schema axioms is the problem of determining whether, for any RDF graph satisfying a given set of schema axioms, the answers to a query

of the target database to RDF, a SPARQL query is translated into an pivot abstract query by matching SPARQL graph patterns with relevant mappings.. This step can be made generic if

First, we extend the concept of SPARQL graph pattern defined over a crisp RDF data model, to the concept of fuzzy graph patterns that allows: (1) to query a fuzzy RDF data model,