• Aucun résultat trouvé

RDF-Based Integration with SPARQL Building System for Life Science Database Archive

N/A
N/A
Protected

Academic year: 2022

Partager "RDF-Based Integration with SPARQL Building System for Life Science Database Archive"

Copied!
2
0
0

Texte intégral

(1)

RDF-Based Integration with SPARQL Building System for Life Science Database Archive

Atsuko Yamaguchi1, Katsuhiko Okubo2, Norio Kobayashi3, Kouji Kozaki4, Sadahiro Kumagai2, Kai Lenz3, Tomoe Nobusada5, Hongyan Wu1, Yasunori

Yamamoto1, and Hideki Hatanaka5

1 Database Center for Life Science (DBCLS), ROIS, 178-4-4 Wakashiba, Kashiwa, Chiba, 277-0871 Japan

{atsuko, wu, yy}@dbcls.rois.ac.jp

2 Information & Telecommunication Systems Company, Hitachi Ltd, 6-26-2 Minami Oi, Shinagawa-ku, Tokyo 140-8573 Japan

{katsuhiko.okubo.yh, sadahiro.kumagai.jj}@hitachi.com

3 Advanced Center for Computing and Communication (ACCC), RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198 Japan

{norio.kobayashi, kai.lenz}@riken.jp

4 The Institute of Scientific and Industrial Research (ISIR), Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka, 567-0047 Japan

kozaki@ei.sanken.osaka-u.ac.jp

5 National Bioscience Database Center, JST,

5-3, Science Plaza 7F, Yonbancho, Chiyoda-ku, Tokyo 102-8666 Japan {nobusada, hideki}@biosciencedbc.jp

Abstract. The Life Science Database Archive (LSDB Archive, https://dbarchive.biosciencedbc.jp/ ) is a service to collect, preserve and provide databases generated by life-science researchers in Japan. As of September 2015, the LSDB Archive includes 103 databases and all the databases can be downloadable with appropriate licenses and metadata.

Although a simple keyword search tool is available for the databases, more flexible retrieval system to obtain relevant data from heteroge- neous databases is required. Therefore, we first converted the databases into RDF datasets, uploaded in a triple store. Then, we developed a prototype of retrieval system using SPARQL Builder. Because SPARQL Builder assists users in writing queries, the prototype enables users with- out knowledge of RDF to access the datasets.

Keywords: SPARQL, RDF, database archive, life-science databases

1 Introduction

In life sciences, many kinds of data have been generated as results of experi- mental research. Although they are often organized and provided as databases, we found that many databases in Japan were not maintained anymore after fundings of projects end as of 2006 [1]. Even in case that databases are main- tained, because they were unclear with respect to the terms of use, or were not

(2)

downloadable, they may not be fully used. To address these issues, the Life Sci- ence Database Archive (LSDB Archive, https://dbarchive.biosciencedbc.jp), a service to maintain, store and provide downloadable databases with appropriate licences and unified descriptions, started from 2009. As of September 2015, the LSDB Archive includes 103 heterogeneous databases generated by life-science researchers during national projects in Japan.

Although a simple keyword search tool is available for the databases, more flexible retrieval system to obtain relevant data from heterogeneous databases is required. To do so, we decided to use semantic web technologies to integrate the databases. We converted the databases into RDF datasets, and uploaded in a triple store with a SPARQL endpoint as a trial. However, a SPARQL query construct is intractable to users who are unfamiliar with semantic web technolo- gies although a SPARQL endpoint is very flexible retrieval system. As a system for assisting users to write SPARQL query, we employed SPARQL Builder [2]

that is a semiautomatic SPARQL query generation system. Using this system, we developed a prototype of a search interface for LSDB Archive that enable users to extract semantically related data.

2 Method and Result

To generate initial RDF datasets from databases in LSDB Archive, we used To- goDB [3], which accepts tabular formatted data and generates RDF datasets.

By indicating a class for each column in TogoDB, rdf:type for objects are au- tomatically attached. To type subjects, we added an additional data including rdfs:domain for each property corresponding to a column. Because all the classes used in the RDF datasets are introduced from external ontologies, we extracted necessary part of those ontologies. Then we generated SPARQL Builder meta- data for the RDF datasets together with extracted ontologies.

We then developed a prototype of a search interface on RDFized LSDB Archive using SPARQL Builder. Using this interface, by selecting two classes and a relationship between the two classes, users can search for desired data using SPARQL query from combined heterogenious RDF datasets in LSDB archive.

Acknowledgments. This work was supported by the National Bioscience Data- base Center (NBDC) of the Japan Science and Technology Agency (JST).

References

1. Ministry of Education, Culture, Sports, Science, and Technology (MEXT) Inte- grated Database Project: http://lifesciencedb.mext.go.jp/en/

2. Yamaguchi A., Kozaki K., Lenz K., Wu H, Kobayashi N.: An Intelligent SPARQL Query Builder for Exploration of Various Life-science Databases, CEUR Workshop Proceedings 1279, The 3rd International Workshop on Intelligent Exploration of Semantic Data (IESD 2014), Riva del Garda, Italy.

3. TogoDB: http://togodb.org/

Références

Documents relatifs

For efficient circulation, intelligent analysis and integration of such het- erogeneous data, semantic web technologies including RDF and SPARQL have been adapted, and

Users can upload arbitrary ontologies and datasets, select the rea- soner to be exploited for faceted navigation and query answering, customise the kinds of annotations relevant

SPARQL endpoint list at the starting process of the crawler module. 2) After specifying the SPARQL endpoint, the user is presented with a list of classes contained in the dataset of

ern medicine initiated the pattern of ob- serving the sequence of symptoms for diagnosis and prognosis, traditional medicine had a highly developed science not

Since we believe that LSD is a useful language resource in the life science domain to process Japanese and English text data seamlessly, linking LSD to DBpedia enables us to

We have introduced the problem of determining the correct sense of ambiguous terms depending on their context in the semantic annotations in open registries and evaluated the use

We approach the study by defining a paradigmatic use case for interactive exploration of life sciences data, and used it to drive the design of an architectural model that integrates

The tool essentially extracts asserted publication meta-data together with the knowledge implicitly present in the respective text, integrates the emergent content and exposes it via