• Aucun résultat trouvé

EU Framework Program for Research and Innovation (SC5-18a-2014 - H2020)

N/A
N/A
Protected

Academic year: 2022

Partager "EU Framework Program for Research and Innovation (SC5-18a-2014 - H2020)"

Copied!
20
0
0

Texte intégral

(1)

EU Framework Program for Research and Innovation (SC5-18a-2014 - H2020)

Project Nr: 641538

Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Deliverable D4.5

Observation inventory description and results report

Version 1

Due date of deliverable: 31/01/2017 Actual submission date: 06/02/2017

(2)

Creator CNR

Editors CNR

Description Report on Observation Inventory.

Publisher ConnectinGEO Consortium Contributors ConnectinGEO Partners

Type Text

Format MS-Word

Language EN-GB

Creation date 27/01/2017 Version number 1

Version date 06/02/2017 Last modified by

Rights Copyright © 2015, ConnectinGEO Consortium

CO (confidential, only for members of the consortium) X PU (public)

PP (restricted to other programme participants) RE (restricted to a group specified by the consortium) When restricted, access granted to:

X R (report) P (prototype) D (demonstrator) O (other)

Draft Where applicable:

X WP leader accepted Accepted by the PTB

PMB quality

controlled

Accepted by the PTB as public document

X Coordinator accepted

to be revised by all ConnectinGEO partners for approval of the WP leader

for approval of the PMB

for approval of the Project Coordinator

(3)

Requested deadline

(4)

Acronym

CNR_MS Mattia Santoro (CNR) CNR_SN Stefano Nativi (CNR) CREAF_JM Joan Masò (CREAF)

Copyright © 2015, ConnectinGEO Consortium

The ConnectinGEO Consortium grants third parties the right to use and distribute all or parts of this document, provided that the ConnectinGEO project and the document are properly referenced.

THIS DOCUMENT IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS DOCUMENT, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

(5)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

Table of Contents

Executive Summary ... 6

1 Introduction ... 7

1.1 Scope & purpose of the document ... 7

2 Observation Inventory ... 8

3 Process ... 8

3.1 User Needs ... 9

3.2 GEOSS Resources ... 10

3.3 Possible Gaps ... 11

4 Implementation ... 12

5 Results ... 14

5.1 Charts ... 14

5.2 Summary ... 17

6 Conclusions ... 18

(6)

Executive Summary

The ConnectinGEO project dedicated a specific thread of its gap analysis, namely bottom-up thread 2, to the analysis of the observations and measurements that are currently registered in GEO Discovery and Access Broker (DAB).

A set of needs were collected from users. These were then matched against the metadata information harmonized by the DAB to identify possible gaps in terms of discoverability and accessibility of the EVs listed in the collected user needs.

Since not all metadata records provided to GEOSS come with the metadata information necessary to run the targeted analysis, there was the need to add this information to available metadata. To this aim, an Observation Inventory (OI) was created and populated using metadata information harmonized by the DAB enriched by the ConnectinGEO Analyzer.

The results of the analysis highlight possible gaps in present availability of resources in GEOSS, when compared to user needs. The reasons leading to such results include both technological and non-technological limitations.

(7)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

7

1 Introduction

The ConnectinGEO project defined and utilized a formalized methodology to create a set of observation requirements, relate these to information on available observations and identify possible gaps.

Gaps in the information provided by current observation systems as well as gaps in the systems themselves were derived from five different threads, following both a bottom-up and a top-down approach. One of the bottom-up threads, namely bottom-up thread 2, was dedicated to the analysis of the observations and measurements that are currently registered in GEO Discovery and Access Broker (DAB) [1]. To this aim, an Observation Inventory (OI) was created and populated using metadata information harmonized by the DAB.

1.1 Scope & purpose of the document

This document provides a summary of Bottom-up thread 2 of the ConnectinGEO gap analysis: analysis of the observations and measurements that are currently in GEOSS Discovery and Access Broker.

The document is organized as it follows. First, the ConnectinGEO Observation Inventory (OI) is briefly introduced. The following section describes the process defined to analyze the ConnectinGEO OI content, including the utilized user requirements and possible gaps to be identified.

Finally, the results of the analysis are presented and discussed.

(8)

2 Observation Inventory

The ConnectinGEO Observation Inventory (OI) is described in details in Deliverable 4.2 [2]. For readers’ convenience, a brief description is reported here.

Figure 1 depicts the high-level architecture of the OI. GEOSS Metadata content is analyzed and enriched by the ConnectinGEO Analyzer. The enriched metadata content is then stored into the OI Database (whose schema is defined in Deliverable 4.1 [3]). This process is needed in order to augment the information available in GEOSS content metadata to comply with the OI DB schema.

Figure 1 – High-level Architecture of the ConnectinGEO Observation Inventory

The OI is accessible by client applications using the GEO DAB supported interfaces1. Besides, an ad-hoc extension to the GEO DAB APIs was introduced in order to use JavaScript for submitting complex queries (i.e. with nested logical operators) to the OI, as described in Deliverable 4.2.

3 Process

Figure 2 depicts a high-level representation of the process that was implemented to perform the analysis. A set of needs were collected from users. These needs were then matched against the metadata information

1 http://www.geodab.org

(9)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

9

harmonized by the DAB and enriched by ConnectinGEO Analyzer, i.e. the OI content.

The results of this matching were collected and grouped to identify possible gaps in terms of discoverability and accessibility of available resources.

Figure 2 – High-level Process

3.1 User Needs

For the scope of this document a user need is defined as an Essential Variable (EV) characterized by a set of requirements. A requirement is a (property, operator, value) triple, e.g.

spatial

resolution equal to 1 km

property operator value

In general, different applications might need the same EV, but with different requirements both in terms of properties and values. For example, a climate change model might need an EV with a yearly, or even decadal, temporal resolution; while short-term weather forecast models might need the same EV with an hourly temporal resolution.

In collaboration with WP3, a list of user needs was collected. The list includes EVs from the terrestrial domain of Essential Climate Variables. As far as

(10)

requirements, for each EV the list includes operators and values for the following properties:

• Desired spatial resolution;

• Desired temporal resolution.

Table 1 lists the user needs which were utilized to run the analysis.

Table 1 – User Needs

Essential Variable Desired Spatial Resolution

Desired Temporal Resolution

River Discharge 1km Daily

Water use 1km Monthly

Groundwater 1km Monthly

Lakes 1km Annual

Snow Cover 1km Annual

Glaciers and ice caps

1km Annual

Ice sheets 1km Annual

Permafrost 1km Annual

Albedo 1km Monthly

Land cover 1km Annual

FAPAR 1km Monthly

LAI 1km Monthly

Above ground biomass

1km Annual

Soil carbon 1km Annual

fire disturbance 1km Daily

soil moisture 1km Monthly

3.2 GEOSS Resources

GEOSS is composed of contributed Earth Observation systems, ranging from systems collecting primary data, to systems concerned with the creation and distribution of information products. The GEOSS Common Infrastructure (GCI) implements a digital infrastructure that coordinates access to these systems, interconnecting and harmonizing their data, applications, models, and products.

(11)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

11

One of the main components of the GCI is the GEO DAB; this component applies the brokering approach for multidisciplinary interoperability [4] [5] and implements the required harmonization, mediation and distribution functionalities of a brokering framework.

Currently the GEO DAB provides access to more 150 data systems. The total number of discoverable resources is periodically updated; presently the number of discoverable resources is about 41 M records and about 190 M granules.

GEO DAB publishes periodically a report about available brokered data systems and resources (http://www.geodab.org). Figure 3 shows the report from June 2016.

Figure 3 – GEO DAB Resources, June 2016

3.3 Possible Gaps

The analysis was specifically tailored to identify possible gaps in terms of discoverability and accessibility of the EVs provided by user needs list.

Besides, different combinations of the required spatial and temporal constraints were utilized. This allows identifying trends in resources availability.

To this aim, each of the needs in Table 1 was expressed as three constraints, defined as it follows:

Text: this constraint is satisfied when the EV name (first column of Table 1) is found in the target metadata element(s);

" " " "

(12)

Spatial Resolution: this constraint is satisfied when the metadata provides information about the spatial resolution of the described resource and the resolution value is within the required bounds;

Temporal Resolution: this constraint is satisfied when the metadata provides information about the temporal resolution of the described resource and the resolution value is within the required bounds.

For each constraint, both a strict and a relaxed versions were defined, as reported in Table 2.

Table 2 – User Needs as Query Constraints

Version

Constraint Strict Relaxed

Text Match EV Name in Title

or Keyword Match EV Name in Title or Keyword or Abstract Spatial Resolution <= Desired Spatial

Resolution Any

Temporal Resolution <= Desired Temporal

Resolution Any

4 Implementation

Since not all metadata records provided to GEOSS come with the metadata information necessary to run the targeted analysis, there was the need to add this information to available metadata. The task of enriching metadata coming from the GEO DAB is implemented by the ConnectinGEO Analyzer, by means of its Enrichers [2].

The enrichers which were used to run this analysis are described in the following of this section.

The accessibility information is added by the Accessibility Enricher. For each access link which is present in the incoming metadata record, this enricher tries to download the linked content and checks the content type. The enricher marks the resource as not accessible if at least one of the following conditions holds:

• No access link is provided;

• No access link can be reached;

(13)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

13

• No access link provides a content type which can be classified as data (e.g. a content type pdf is not classified as data).

With this type of enricher, it was possible to characterize resources as directly accessible or not.

In order to be able to enrich metadata records with spatial and temporal resolution, an ad hoc enricher was developed to extract this information from other metadata fields. In particular, this enricher applies simple syntactic rules to extract the resolution information form the title field of the incoming metadata records.

The GEO DAB provides access not only by harvesting metadata from systems contributing to GEOSS, but also distributing queries live to a number of systems. For these distributed systems it is not possible to apply entirely the Enricher pattern [6] utilized to enrich metadata from GEO DAB before their ingestion into the ConnectinGEO OI. First of all, domain-specific distributed systems, not covering the domain of the utilized EVs, were excluded from the analysis – e.g. the Global Biodiversity Information Facility, GBIF, system was not queried for this analysis as it does not provide any data concerning the utilized EVs, the same applies for the Incorporated Research Institutions for Seismology system that provides near real-time information about earth quakes events. Other distributed systems were excluded because they do not support queries, at least, by keyword, which makes it impossible to run the targeted analysis – e.g. INPE system (National Institute for Space Research – Brazil) can be queried by sensor, instrument, spatial and temporal extents.

After this phase, the distributed systems were analyzed by applying the following estimation strategy:

1) Retrieve the number of discoverable records using the EV name as the search term in the distributed query;

2) From the query result set, a set of records is randomly selected;

3) Each selected record is analyzed to determine if it satisfies the constraints in Table 2 (strict and relaxed) and if it contains any direct access link;

4) The percentage of successes, for each constraint and the access test, over the selected records is finally utilized to calculate the actual number of successes for the particular EV.

When distributed systems return collection-level metadata to the query in step 1, the access test is executed as described above for the granules (children of the collection). On the other hand, the constraints in Table 2 (strict and relaxed) are calculated based on the collection-level metadata and are automatically applied to all its children – it is assumed that granules inherit the involved metadata elements (title, abstract, resolutions).

(14)

5 Results

In this section the results are presented. First a set of charts is provided depicting the obtained raw data. Then a brief summary is presented highlighting the main information which can be extracted from the raw data.

5.1 Charts

All figures from Figure 4 to Figure 7 show the raw data which are obtained by applying the strict version of the constraints. Each of these figures provides discoverable (blue bar) and accessible (green bar) information for each EV in Table 1.

In each of the figures from Figure 4 to Figure 7 a different combination of constraints was utilized:

• Text only: Figure 4;

• Text and Spatial Resolution: Figure 5;

• Text and Temporal Resolution Figure 6;

• Text and Spatial Resolution and Temporal Resolution Figure 7.

Figure 4 – Strict Text Match, Number of Discoverable and Accessible resources

(15)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

15

Figure 5 – Strict Text Match AND Strict Spatial Resolution, Number of Discoverable and Accessible resources

Figure 6 – Strict Text Match AND Strict Temporal Resolution, Number of Discoverable and Accessible resources

(16)

Figure 7 – Strict Text Match AND Strict Spatial Resolution AND Strict Temporal Resolution, Number of Discoverable and Accessible resources

Figure 8 shows the trends of discoverable and accessible resources with the variation of utilized constraints, in their strict version. In the left chart, constraints vary (left to right direction) from Text constraint only to Text and Spatial Relation constraints to Text and Spatial Resolution and Temporal Resolution. The right chart shows the trend when varying (left to right direction) from Text constraint only to Text and Temporal Relation constraints to Text and Spatial Resolution and Temporal Resolution.

Figure 8 – Strict Matches Trends

Figure 9 shows the trends of discoverable and accessible resources with the variation of utilized constraints, combining both strict and relaxed versions of the constraints. Upper side charts of Figure 9 show the trends when relaxing only the Text constraint. Lower side charts show trends when all constraints are applied in their relaxed version.

(17)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

17

Figure 9 – Relaxed Matches Trends

5.2 Summary

Results show that about 1% of available resources match strictly user needs in Table 1 and are directly accessible. When considering strict Text match only, about 3.3% of resources can be discovered (but not directly accessed).

When relaxing constraints, the largest variation of matches is for the Text match where about 10.4% of resources can be discovered, but not directly accessed. When using the relaxed version of the Text constraint, also the number of discoverable resources satisfying the Spatial Resolution constraints augments: 4.2% of available resources are discoverable when applying the strict Spatial Resolution constraint and 6.8% when applying the relaxed version.

Finally, it is worth to note that trends in the number of accessible resources does not have any substantial variation when varying the application of constraints, both in terms of which constraints are applied or in which of their versions (strict or relaxed).

(18)

6 Conclusions

The results reported in previous section highlight possible gaps in present availability of resources in GEOSS, when compared to user needs. The reasons leading to such results are discussed below.

From the technological point of view, some limitations restrict the number of discoverable and accessible resources. These limitations include:

• Discovery:

– User needs are unclear (e.g. imprecise keywords);

– Discovery metadata:

• Incomplete (e.g. resolution information is not provided at all)

• Imprecise/unstructured (e.g. resolution information is in title and not in the appropriate metadata field)

• Granularity issue;

• Access:

– Access metadata:

• Incomplete or missing – No “direct” access

• Access via “landing page(s)”

• Access requires a (personal) account

Another important aspect emerging from these results deals with the type of resources which are shared in GEOSS. In fact, the number of discoverable resources matching the Text constraint in its relaxed form, i.e. considering the abstract field of metadata records, considerably augments with respect to the strict Text match passing from 3,3% to 10,4% (6,3 M to 19,9 M). This is very likely an indicator of the fact that many resources shared in GEOSS are “raw”

data, instrumental to generate the searched EV. That is, the abstract mentions the EV name describing possible use of the described resource.

Figure 10 depicts an example of the relation between raw data and EVs in the context of satellite/sensor observations.

(19)

Acronym: ConnectinGEO

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations

Theme: SC5-18a-2014. Coordinating European Observation Networks to reinforce the knowledge base for climate, natural resources and raw materials

19

Figure 10 – Raw data vs Essential Variables

(20)

7 References

[1] S. Nativi, P. Mazzetti, M. Santoro, F. Papeschi, M. Craglia e O. Ochiai,

«Big Data challenges in building the Global Earth Observation System of Systems,» Environmental Modelling & Software, vol. 68, 2015.

[2] M. Santoro, S. Nativi, M. P. Joan e S. Jirka, «D4.2. Observation inventory description and results report,» 2016.

[3] M. P. Joan, D. Nust, M. Van den Broek e S. Nativi, «D4.1 Observation inventory requirements, database schema and queryable fields,» 2015.

[4] L. Vaccari, M. Craglia, C. Fugazza, S. Nativi e M. Santoro, «Integrative Research: The EuroGEOSS Experience,» IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, n. 6, pp. 1603- 1611.

[5] S. Nativi, M. Craglia, L. Vaccari e M. Santoro, «Searching the New Grail:

Inter-disciplinary interoperability,» in 14th AGILE International Conference on Geographic Information Science, Utrecht, 2011.

[6] G. Hohpe e B. Woolf, : Designing, Building, and Deploying Messaging Solutions, Addison-Wesley, 2003.

Références

Documents relatifs

ConnectinGEO (Coordinating an Observation Network of Networks EnCompassing saTellite and IN-situ to fill the Gaps in European Observations” is a new H2020 Coordination and Support

One possible topic could be &#34;experiences and examples of interoperable provision of in situ observation data in different domains (e.g. Have a membership mechanism: ENEON

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN‐situ to fill the Gaps in European Observations..

Since 2003, the standardization organisation the Open Geospatial Consortium (OGC) follows the vision to make heterogeneous sensor data (such as remote satellite / airborne

Eomag, ConnectinGEO (Coordinating an Observation Network of Networks EnCompassing satellite and IN-situ to fill the Gaps in European Observations) ....

2.2 In-situ data compatible to satellite mission data challenge Climate, Land Crop growth or ice sheets, traces of gases Compatibility of in-situ and RS data access and

Project title: Coordinating an Observation Network of Networks EnCompassing saTellite and IN‐situ to fill the Gaps in European Observations.

a) Methods I: Development of a general approach on the contributions of Earth observations data and derived information in achieving the SDGs and in monitoring