• Aucun résultat trouvé

YummyData: Measuring the "Sparkliness" of Biomedical Sparql Endpoints

N/A
N/A
Protected

Academic year: 2022

Partager "YummyData: Measuring the "Sparkliness" of Biomedical Sparql Endpoints"

Copied!
2
0
0

Texte intégral

(1)

The YummyData Initiative: How SPARQL-y is your biomedical endpoint?

Ivar Andrea Splendiani12 and Johan Nystrom-Persson3and Michel Dumontier4 and Yasunori Yamamoty5

1 DERI, Galway, [email protected]

2 intelliLeaf ltd, Cambridge, [email protected]

3 The National Institute of Biomedical Innovation, Osaka, [email protected]

4 Carleton University, Ottawa, Canadamichel [email protected]

5 Database Center for Life Science, [email protected]

Abstract. Although increasing amounts of biomedical data is being provided as structured content on the Semantic Web, there is currently no standardized way to monitor SPARQL endpoints for their availabil- ity, reliability or content flux. Importantly, there are additional issues relating to the provision of version-sensitive data republished by third parties or made available as part of a one off research project. All of these aspects have important consequences for users that rely on feder- ated queries across distributed SPARQL endpoints.

1 Results

We describe the YummyData initiative to provide monitoring of biomedical SPARQL endpoints on a variety of factors including availability, reliability, content summarization, and content evolution. Our prototype website, yummy- data.org, provides simple metrics relating to the availability and response status of a selected set of endpoints, the size of the data set, etc. In addition to these fundamental metrics, our long term goal is to compute a SPARKLE score, a composite metric combining measures such as the number of triples, size and frequency of updates, the number of links to other datasets, and the capabilities of the endpoint server. Although the SPARKLE score is not a measure of the quality of a dataset, it helps indicate whether the dataset changes over time, and whether these changes are likely to be negative or positive in nature. These statistics may possibly correlated with the declared update frequency of the datasets published by the endpoint of a given provider, thus providing an addi- tional input to our score. Finally, yummydata.org also supports custom queries that track endpoint/data metrics, allowing new metrics to be computed and shared amongst participants. Although devising an objective measure of quality is controversial, we believe that the YummyData initiative will help users better understand the content that is currently available while also helping providers understand what kinds of metrics are important to users. We also believe that emergence of more and more third party SPARQL endpoint rating services like YummyData will bring about an environment where we can get more objective

(2)

2

evaluation of endpoints, and therefore qualities of the entire RDF-based biomed- ical data/services are becoming higher. As yummydata.org is an early prototype, we welcome suggestions to clarify existing metrics and to help develop additional metrics to tease out information of interest.

Fig. 1.A screenshot from YummyData.org, showing the evolution of parameters and score for a given endpoint

2 Acknowledgments

We wish to thanks the organizers of the BioHackathon 2012, during which this work originated.

Références

Documents relatifs

Similar to the throughput results, it can be noticed how penalising complex queries results in smaller waiting and processing times for simple queries when using deficit or

The prototype currently offers two entry points: Users can either initiate a keyword search over a given SPARQL endpoint, or they can select any of the al- ready available

Although SPARQL is a stan- dardized query language for searching Linked Open Data and there are many Web APIs, called SPARQL endpoints, that accept SPARQL queries, a user may

We present SCRY, an easily customized, lightweight SPARQL endpoint that facilitates executing user-defined services at query time, making their results accessible immediately

the adoption of Linked Data principles [5], which focus on integration of open data between distributed Web sources, enabled by RDF and SPARQL endpoints.. Web APIs and SPARQL

SPARQture, the tool described in this paper, automates this process and renders the structure of a particular RDF data available behind some SPARQL endpoint using a number of visu-

We show that in setups as the one re- ported in Figure 1 and where endpoints receive large number of concurrent request per day, i.e., the endpoint is in the (50%;75%]

When a query prepared by the client might require a long execution time, a standard SPARQL endpoint implementation will try to execute the query with lots of cost to return answers..