• Aucun résultat trouvé

RIKEN Meta Database: A Life-science Metadata Database Based on the Semantic Web

N/A
N/A
Protected

Academic year: 2022

Partager "RIKEN Meta Database: A Life-science Metadata Database Based on the Semantic Web"

Copied!
2
0
0

Texte intégral

(1)

RIKEN Meta Database: a life-science metadata database based on the Semantic Web

Norio Kobayashi1,2, Kai Lenz1, and Hiroshi Masuya2,1

1 Advanced Center for Computing and Communication (ACCC), RIKEN, 2-1 Hirosawa, Wako, Saitama, 351-0198 Japan

{norio.kobayashi, kai.lenz}@riken.jp

2 BioResource Center (BRC), RIKEN, 3-1-1 Koyadai, Tsukuba, Ibaraki, 305-0074 Japan

hmasuya@brc.riken.jp

Abstract. We introduce a database platform called ‘RIKEN Meta Data- base’ that provides information on RIKEN’s various life-science data- bases to help researchers around the world make full use of RIKEN’s research results. Since RIKEN procures various and large life-sciences datasets through the genomes and phenomes of various species, including datasets for sequence and image data, such resultant research data and databases are heterogeneous and tend to be published in data-domain specific interfaces and formats. To achieve data re-usability, including facilitating data/database discovery and data description as life-science knowledge for further intelligent data analysis, the database platform is implemented to manage metadata of both data and database using Se- mantic Web technologies with standardised vocabularies and life-science ontologies. As of September 2015, 153 databases, including external pub- lic databases and 15 ontologies, are integrated on the platform, and these data can be access via a SPARQL endpoint and a browser-based graph- ical user interface that shows metadata in table and graph format.

Keywords: Semantic Web, life-science database platform, database profile

1 Overview

In recent life-science research, theses research fields have become deeper and more specialised, and collaboration among different research fields, such as in- formatics and material sciences, are actively promoted to comprehensively un- derstand life phenomena. RIKEN is a comprehensive science institute with re- search centres and laboratories, including physics, chemistry, informatics and life sciences laboratories, that produces various and large data as individual databases. Even in RIKEN, which represents the science community in micro- cosm, such data and databases are difficult for researchers to find, and intel- ligent and integrated analyses has not been realised. To address this problem, we have been developing a database platform called the ‘RIKEN Meta Data- base’ (http://metadb.riken.jp) to facilitate the management and publication of

(2)

database metadata according to the Semantic Web. The principal databases pub- lished by RIKEN have generated metadata that can be accessed by SPARQL and a browser-based graphical user interface (GUI).

2 System Architecture

The RIKEN Meta Database is designed to be a lightweight database platform.

The core components of the platform are 1) a SPARQL endpoint, 2) a GUI web server that provides a GUI for data input and display and 3) a resource description framework (RDF) data converter. The SPARQL endpoint manages RDF data and is a server for a user’s SPARQL client and the GUI web server. The GUI web server provides reader-friendly web pages for each database, class and entities in table or graph format by accessing the SPARQL endpoint. The web server also provides an interface that accepts RDF datasets for data uploads.

The data converter is a Java application that accepts a Microsoft Excel file, which is converted to RDF format. This converter is used for wet researchers who are willing to publish their data but are not familiar with the RDF format.

Currently, components 1) and 2) have been deployed on Linux virtual machines with a 1 GB flash memory drive having 96 GB and 8 GB memory, respectively.

3 Available data

As described, RIKEN is a comprehensive science institute that can perform data editing in close cooperation with biologists and informaticians. In such an environment, careful selection of public ontologies and data classes has been performed for each database. As of September 2015, with 15 public ontolo- gies and 153 databases, including RIKEN’s principal databases such as ‘FAN- TOM’ (mammalian [1]), ‘FOX Hunting’ (plant [2]) and ’Metadata of BRC cell resources’ (bioresources [3]), which have 704 classes, 1,141 properties, 27 mil- lion entities and 102 million triples, are available on the platform. In addi- tion, to assist users in easily finding data and databases, the platform provides the ‘RIKEN Database Directory’. This directory provides a collection of data- base profiles, such as W3C’s Health Care and Life Sciences description profile data (http://www.w3.org/TR/hcls-dataset/), including statistics data, and the SPARQL Builder Metadata (http://sparqlbuilder.org/), to find relationships be- tween classes and corresponding triple paths.

References

1. The FANTOM Consortium and the RIKEN PMI and CLST (DGT): A promoter- level mammalian expression atlas. Nature 507, 462–470 (2014)

2. Ichikawa T., Nakazawa M., Kawashima M., Iizumi H., Kuroda H., Kondou Y., Tsuhara Y., Suzuki K., Ishikawa A., Seki M., Fujita M., Motohashi R., Nagata N., Takagi T. Shinozaki K., Matsui M.: The FOX hunting system: an alternative gain-of-function gene hunting technique. Plant J. 45, 974–985 (2006)

3. Nakamura Y.: Bio-resource of human and animal-derived cell materials. Exp. Anim.

59(1), 1–7 (2010)

Références

Documents relatifs

Such a framework would be useful in various application scenarios, ranging from filtering and processing social networking event streams, over integrating and aggregating events

Using the declarative Silk - Link Specification Language (Silk-LSL), data publishers can specify which types of RDF links should be discovered between data sources as well as

To allow people to query and mash up data sources intuitively, we propose a data mashup language, called MashQL.. The main novelty of MashQL is that it allows non IT-skilled people

Social tagging systems are nowadays the preferred way to classify information in the Web 2.0 sites. As their popularity increases, many works try to solve some of their

The rich suite of datasets include data about potential algal biomass cultivation sites, sources of CO 2 , the pipelines connecting the cultivation sites to the CO 2 sources and

One of the main goals of LEAPS is to enable the stakeholders of the algal biomass domain to interactively explore, via linked data, potential algal sites and sources of

This paper describes the design and implementation of the recommender applica- tion framework developed as a part of the Ontology Application Management (OAM) Framework [3], which

Indeed, our experience with the framework showed that the approach it proposes, regard- less of a particular implementation or programming language, can lead to better RESTful design