• Aucun résultat trouvé

Pipelines for interoperability of distributed resources in ecology: the AnaEE-France context

N/A
N/A
Protected

Academic year: 2021

Partager "Pipelines for interoperability of distributed resources in ecology: the AnaEE-France context"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-03028515

https://hal.archives-ouvertes.fr/hal-03028515

Submitted on 15 Dec 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Open licence - etalab|

Pipelines for interoperability of distributed resources in ecology: the AnaEE-France context

Christian Pichot, Philippe Clastre, Benjamin Jaillet, Damien Maurice, Rachid Yahiaoui

To cite this version:

Christian Pichot, Philippe Clastre, Benjamin Jaillet, Damien Maurice, Rachid Yahiaoui. Pipelines for interoperability of distributed resources in ecology: the AnaEE-France context. 5ème édition des Journées Nationales du Développement Logiciel (JDEV2020), Jul 2020, Rennes, France. �hal-03028515�

(2)

Dataverse repository

²

Pipelines for interoperability

of distributed resources in ecology:

the AnaEE-France context

PICHOT C.

(1)

, CLASTRE P.

(1)

, JAILLET B.

(1)

, MAURICE D.

(2),

YAHIAOUI R.

(3)

The AnaEE-France (Analysis and Experimentation on Ecosystems) Research Infrastructure offers experimental facilities for studying ecosystems and biodiversity. The data generated by the AnaEE platforms are most often managed in relational database.

A distributed Information System (IS) is developed, based on semantic interoperability of its components and using common vocabularies (AnaeeThes thesaurus and OBOE-based ontology). Discovery and access portals are fed by information (rdf triples) produced by the semantic annotation of the resources that also generate metadata and datasets.

Two pipelines are developed for facilitating the semantic annotation and exploitation processes which may represent a huge conceptual and practical work.

Context and objective

Relational databases

Metadata & data for discovery portals

and repositories

Semantic data generation

from a pipeline for database annotation

2. INRAE UMR SILVA route d'Amance 54280 Champenoux . 1. INRAE URFM 228 route de l’Aérodrome 84914 Avignon

Experimental, analytical and modelling platforms

Graph database

Ontology

(OBOE-based for AnaEE)

RDB raw data mapping conf file (.odba) Step1_pipeline processing

End Point inferred triplesraw data with raw data

ofEntity hasContext hasMeasurement hasValue usesStandard ofCharacteristic Observation Temporal information Observation Temporal information

hasContext hasContext hasContext

Semantic data graph (per category) Variable semantic description

Step2_pipeline processing Step3_pipeline processing Step4_pipeline processing

Semantic data exploitation

by a pipeline for metadata and datasets generation

AnaEE

standard Category Context Entity Characteristic Unit

Phytoplankton Biodiversity Water Phytoplankton Volume Per Volume

MicroMeterCubed Per Millimeter

WaterPH Physical

Chemistry Water Column Water pH pHUnit

... ... ... ... ... ...

+

+

← A csv input file describes the main characteristics of the variables submitted for semantic annotation, allowing to parametrize generic semantic graphs (Yed* format).

The annotation pipeline can be used in different contexts of ontologies and databases.

It includes shell scripts and specific java developments and requires the Yed, Ontop, Corese and Blazegraph softwares. It is deployed as a classical scripted application or as a dockerised one.

The annotation pipeline consists in 4 steps:

1. generation of the mapping files to be used by

Ontop** (database connexion parameters, sets of sql queries and of the corresponding triples)

2. production of the initial triples (ttl files) by a

specific program and Ontop tool.

3. production of inferred triples by Corese***

from the ontology rules

4. uploading of the ttl files within the

Blazegraph**** software and initialisation of the end point

Current developments will allow data annotation from several relational database software (PostgreSQL, MySQL, …) and csv flat files.

*https://www.yworks.com/products/yed

**Diego Calvanese, Benjamin Cogrel, Sarah Komla-Ebri, Roman Kontchakov, Davide Lanti, Martin Rezk, Mariano Rodriguez-Muro, and Guohui Xiao. Ontop: Answering SPARQL Queries over Relational Databases. In: Semantic Web Journal 8.3 (2017), pp. 471–487.

***http://wimmics.inria.fr/corese

****https://wiki.blazegraph.com/wiki/index.php/Main_Page

Outputs of the annotation pipeline are used by an exploitation pipeline for:

1. generation (through a SPARQL query on raw data) of synthetic data that feeds the discovery portal,

2. generation of standardised GeoDCAT and ISO19115/19139 metadata records,

3. generation of data file (NetCDF as first format) from selected perimeters (e.g years, experimental sites , variable categories..). In that case, the annotation pipeline is

launched using a dedicated webservice.

Metadata and data products are transferred to a Dataverse repository

End Point delimitationperimeter OBOE

metadata

conversion GeoDCAT

metadata

API

(XSLT) 19139

3. INRAE US INFOSOL 2163 avenue de la Pomme de Pin 45075 Orléans

WaterColumn Water pHUnit pH {value} Measurement Observation ofEntity Observation Spatial information Observation

Spatial information Site informationObservation

Observation Site information Specific component Discovery Portal Graph database End Point GeoDCAT metadata 19115/19139 Spatial information Spatial information Observation

Références

Documents relatifs

Graph-KD provides methods to understand relation- ships between concepts using open discovery, closed discovery and knowledge inference.. The purpose of the tool is the support

The formalization of this information from different heterogeneous sources into RDF triples and the integration of these triples in a graph database seems to be logical next step

The main problem of the other approaches is that each relevancy calculation formula of a document unit is based on different methods corresponding respec- tively to spatial,

The proposed knowledge-based approach provides tools to integrate user data with the management service operations, and offers a more complete understanding of user’s contents

The Flemish Department of Economy, Science and Innovation (EWI) has started the Flanders Research Information Space programme (FRIS) in 2007 with the goal of transparent and

Making the data provider or pro- fessional catalogers responsible for the creation and maintenance of the metadata is the common approach and its drawbacks are discussed afterwards..

In this paper, we outline a translation of the RCC-8 calculus into OWL-DL, by adapting some of the known results on the translation of qualitative spatial formalisms into Modal

We discuss current approaches that, for the sake of automation, provide formal treatments to the problem of seman- tic interoperability and integration, and we reflect upon the