• Aucun résultat trouvé

From heterogeneous, multi-source data to harmonized datasets: a major challenge for the assessment of soil functions. The LANDMARK project approach

N/A
N/A
Protected

Academic year: 2022

Partager "From heterogeneous, multi-source data to harmonized datasets: a major challenge for the assessment of soil functions. The LANDMARK project approach"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02742163

https://hal.inrae.fr/hal-02742163

Submitted on 3 Jun 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

From heterogeneous, multi-source data to harmonized datasets: a major challenge for the assessment of soil

functions. The LANDMARK project approach

Sébastien Drufin, Nicolas Saby, Marion Bardy, Erika Micheli, Benoît Toutain, Rachel Creamer

To cite this version:

Sébastien Drufin, Nicolas Saby, Marion Bardy, Erika Micheli, Benoît Toutain, et al.. From heteroge- neous, multi-source data to harmonized datasets: a major challenge for the assessment of soil functions.

The LANDMARK project approach. 2. Global Soil Security Conference, Dec 2016, Paris, France.

2016. �hal-02742163�

(2)

www.landmark2020.eu email [email protected] twitter @Landmark2020

ANDMARK

L AND M ANAGEMENT : A SSESSMENT , R ESEARCH , K NOWLEDGE BASE

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 635201

 Determination of a proper schema for data and a framework for all the different transformations applied to data. Multi-sources data origin calls for the use of suitable tools of Business

Intelligence Information System. Data are collected according to the needs expressed by research teams in charge of data processing.

 Set up of the computer infrastructure: creation of a data warehouse Transformation of dataset from “raw” data to datamarts, according to the different steps :

1 - Data are collected in various formats (operational systems);

2 - Data are transformed in order to match with database system (data staging);

3 - Data are then related to reference data in order to relate then to common

typologies, etc. (data warehouse) and thus be harmonized, using Extract, Transform and Load (ETL) processes;

4 - At last, harmonized data are gathered in datasets (called datamarts) and delivered as outputs for the algorithms and models that will compile data. The LANDMARK project will develop harmonized datasets for the assessment of each of the soil functions.

LANDMARK1 is a H2020 pan-European multi-actor consortium of leading academic and applied research institutes, chambers of agriculture and policy makers that will develop a coherent framework for soil management aimed at sustainable food production across Europe. The LANDMARK proposal builds on the concept that soils are finite resources that provide a range of ecosystem services known as “soil functions”. Functions relating to agriculture include: primary productivity, water regulation & purification, carbon-sequestration & regulation, habitat for biodiversity and nutrient provision & cycling.

 Develop a comprehensive scientific “SxExM”

framework for

understanding and

quantifying soil functions in relation to soil properties (S), climatic zone (E) and soil

management (M), stratified by land use.

Challenge = gather and harmonize multi-source data in a EU context

 Metadata description is in progress, being written directly into tables in the data warehouse as comment for each fields, with a track of every modification done on each raw data during the harmonization – mapping step via ETL tools.

 On these data, a diagnostic horizon computation will be done. Scripts that will be used to proceed those calculations will be implement into ETL treatment to produce estimations of diagnostic horizon that will be directly integrated as a complementary attribute in the soil.

Only soil data have been integrated. In the next step, environment and land management data will be added into a new table within the data warehouse.

European data (e.g. Corine land cover, CAPRI, etc…) will also be integrated within the data warehouse to inform soil profiles on land management.

Data type Country Profiles Horizons

Profile data

Belgium 6889 41789

Spain 47 196

France 1364 4353

Italy 5612 46481

Romania 10 30

Hungary - Czech republic 2213 9077

Ireland 806 3628

Monitoring program

Ireland - Topsoil 1310 -

Sweden - Topsoil 5182 -

France - Composite 2215 4243 Austria - Composite 139 973

Total location measurement 25787 110770

Background & objectives

Material & Methods

Intermediate Results

What’s next ?

International multi-source datasets regarding:

S oil

E nvironment

M anagement practices

4 3

1 2

Same format same schema

Mapping

Data Bases with common format

Format setting Raw data

bases

Postgresql Excel

Access Postgresql

Postgresql

Postgresql

Postgresql Others…

A V AI LAB LE D AT ASET S

Metadata Metadata

Data extracted and provided by partners Data and metadata delivred to partners for validation

Soil attributes are the first type of data collected from 11 datasets yet. Via ETL tools, from those datasets, 29 different physical, chemical, descriptive attributes have been integrated within the data warehouse. An amount of 988.494 workable data is now available for processing.

Table summarizing the amount of location measurement available per dataset provided.

Mapping representation of every location measurement of data collected

according on whether data are from monitoring program or profile descriptions.

Data integration and harmonization process from heterogeneous raw data providing to homogenous datamarts delivered for soil function modeling.

S.Drufin1, N. Saby1, M. Bardy1, E. Micheli2, B. Toutain1, R. Creamer3

1 INRA, unité InfoSol, US 1106, F – 45075 Orléans Cedex 2, France

[email protected], [email protected], [email protected], [email protected]

2 Szent István Egyetem • 2100 Gödöllő, Páter Károly utca 1 [email protected]

3Department of Soil Quality, Wageningen University, Droevendaalsesteeg 4, NL-6708 PB Wageningen, P.O. Box 47 , The Netherlands [email protected]

Références

Documents relatifs

Data Inte- gration and VISualization (DIVIS) : from large heterogeneous datasets to interpretable visualisations in plant science... Data Integration and VISualization

3 Data are available as Darwin Core Archive (DwC-A), which is a biod- iversity data standard that makes use of the DwC terms, it is composed of a set of files for describing

To assess the quality of imputation and legitimate the use of iterative MLFAMD to im- pute the Traumabase, we first perform simulations by inserting an additional of 20% of MCAR

With three effect resources, it is now possible to generate three different dependence graphs, one for each of our effect resources: a Data Dependence Graph, an Environment

Given a user-information need formulated as a top-k query over conceptual in- tegrated model (OBDA), which describes multiple heterogeneous distributed, structured and unstructured

In terms of route length, OLSR selects always a 3-hop path, then, all metrics have the same number of medium accesses. 4: Link Throughput... For farther nodes, packets should transit

This chapter aimed to demonstrate, by several illustrated examples, that Human Health should be considered as another major challenge of global soil security by emphasizing the

First, although the positive correlation we obtained be- tween prepulse-elicited reactivity (4 –16 dB above back- ground) and PPI magnitude in mice was opposite to the