• Aucun résultat trouvé

SCRY: Enabling Quantitative Reasoning in SPARQL Queries

N/A
N/A
Protected

Academic year: 2022

Partager "SCRY: Enabling Quantitative Reasoning in SPARQL Queries"

Copied!
2
0
0

Texte intégral

(1)

SCRY: Enabling quantitative reasoning in SPARQL queries

Bas Stringer1,4, Albert Meroño-Peñuela2,3, Antonis Loizou2, Sanne Abeln1, and Jaap Heringa1

1 Centre for Integrative Bioinformatics, VU University Amsterdam, NL

2 Knowledge Representation and Reasoning Group, VU University Amsterdam, NL

3 Data Archiving and Networked Services, KNAW, NL

4 To whom correspondence should be adressed (b.stringer@vu.nl)

The inability to include quantitative reasoning in SPARQL queries slows down the application of Semantic Web technology in the life sciences.SCRY, ourSPARQL compatible service layer, improves this by exe- cuting services at query time and making their outputs query-accessible, gener- ating RDF data on demand. The power of this approach is demonstrated with two use cases, where we use SCRY to calculate standard deviations and to find homologous proteins.

More and more biological knowledge is being made available through the Seman- tic Web as Linked Open Data. However, not all knowledge is easily captured or stored in static triple repositories. This is especially true for knowledge derived from quantitative reasoning.

For example, BLAST is commonly used to predict homology, but its input parameters and interpretation of its output vary greatly depending on the re- search question. Precomputing the outcomes of all parameter combinations and every conceivable input is practically impossible. Thus, using BLAST results in a SPARQL query, for example to look up information about a protein’s homologs, requires these results to be generated, analysed and incorporated at query time.

No currently available tools facilitate this effectively.

Likewise, the standard deviation of a set of measurements can be used to detect outliers. Thus, if an observation is classified as an outlier depends on the other observations in the set. Many triplestores contain data which a simple query can group into sets, but SPARQL does not support a straightforward way to calculate their standard deviation. This makes it difficult to use outlier selection in SPARQL queries.

More generally, many research questions require some form of quantitative reasoning. Such reasoning is currently poorly supported within SPARQL queries and is often infeasible to capture through precalculation; this hinders the appli- cation of Semantic Web technology to research questions in the life sciences.

We present SCRY, theSPARQLcompatible service layer, which is a lightweight SPARQL endpoint that interprets parts of basic graph patterns as calls to user defined services. These services are evaluated at query time, generating triples needed to resolve the query on the fly.

(2)

SCRY allows users to incorporate algorithms of arbitrary complexity within standards-compliant SPARQL queries, and to use the generated outputs directly within these same queries. Unlike traditional SPARQL endpoints, the RDF graph against which SCRY resolves its queries is generated at query time, by executing services encoded in the query’s graph patterns. Figure 1 illustrates the dataflow of a typical query using SCRY, where its services are accessed through federated queries from a primary endpoint.

Fig. 1.Dataflow diagram of a typical SPARQL query using SCRY, through federated queries from a primary endpoint. The primary endpoint will send a federated query to SCRY, optionally interrogating a persistent RDF graph to retrieve inputs for services.

SCRY will parse service calls from the federated query’s basic graph patterns, and generate an RDF graph by executing them. It then resolves the federated query like a traditional endpoint, and returns its results to the primary endpoint.

The federation-oriented design allows for easy integration with existing SPARQL endpoints and tools. Moreover, SCRY facilitates the execution of arbitrary ser- vices within SPARQL queries, which enables quantitative reasoning as well as other forms of data processing and analysis. Ultimately, this allows Semantic Web technologies to be more effectively applied to research questions in the life sciences.

We demonstrate the power of SCRY’s on-the-fly triple generation with two use cases. First, we select outliers among GO-annotated proteins from the UniProt SPARQL endpoint, using SCRY to calculate the standard deviation of their se- quence lengths. Second, we combine SCRY, BLAST and the Human Protein Atlas to find homologous proteins which are expressed in the same tissues as a query protein.

2

Références

Documents relatifs

We show that in setups as the one re- ported in Figure 1 and where endpoints receive large number of concurrent request per day, i.e., the endpoint is in the (50%;75%]

This section describes the translation of SPARQL update operations into XQuery ex- pressions using the XQuery Update Facility. We present how similar methods and al- gorithms

When a query prepared by the client might require a long execution time, a standard SPARQL endpoint implementation will try to execute the query with lots of cost to return answers..

For example, IQ-rewritability means rewritability of an OMQ from (ALCI, UCQ) into an OMQ from (ALCI , IQ) with the IQ formu- lated in ALCI, and PUBGP-rewritability means

Since the new query functionality offered by SPARQL-LD (allowing to query any HTTP resource containing RDF data) can lead to queries with large num- ber of service patterns which

Although SPARQL is a stan- dardized query language for searching Linked Open Data and there are many Web APIs, called SPARQL endpoints, that accept SPARQL queries, a user may

We present SCRY, an easily customized, lightweight SPARQL endpoint that facilitates executing user-defined services at query time, making their results accessible immediately

(ii) the distinctness list for aggregation computation over the raw query is formed, consisting of all attributes (both non-aggregated and aggregated ones alike)