• Aucun résultat trouvé

Experimental data annotation guided by an ontology for decision support: @Web project

N/A
N/A
Protected

Academic year: 2021

Partager "Experimental data annotation guided by an ontology for decision support: @Web project"

Copied!
44
0
0

Texte intégral

(1)

HAL Id: hal-02792886

https://hal.inrae.fr/hal-02792886

Submitted on 5 Jun 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Experimental data annotation guided by an ontology for decision support: @Web project

Patrice Buche, Juliette Dibie

To cite this version:

Patrice Buche, Juliette Dibie. Experimental data annotation guided by an ontology for decision support: @Web project. Méthodes et outils pour l’Open Data, Dec 2014, Montpellier, France. �hal-02792886�

(2)

Experimental data annotation

guided by an ontology

for decision support:

@Web project

Patrice Buche

JRU IATE, Montpellier

Juliette Dibie

JRU MIA, Paris

(3)

L’équipe projet @Web en 2014

Permanents:

Patrice Buche IR HDR, INRA IATE Brigitte Charnomordic, INRA MISTEA Stéphane Dervaux, IE INRA MIA

Sébastien Destercke, CR CNRS HEUDYASIC Juliette Dibie, PR AgroParisTech MIA Liliana Ibanescu, MC AgroParisTech MIA Luc Menut AI, INRA IATE

Michard Rakotosone, IR, INRA Plateforme CEPIA Mathieu Roche, CR CIRAD TETIS

Non permanents:

Lilia Berrahou, Doct. UM2 IATE-LIRMM Mathilde Salthun-Lasalle, IE INRA IATE-MIA Rim Touhami, Doct. INRA IATE-MIA

Permanents:

Cédric Baudrit, CR INRA I2M

Abdellatif Barrakhat, CR INRA IATE Jean-Pierre Belaud, MC ENSIACET LGC Valérie Guillard, MC UM2 IATE

Xavier Rouau, DR INRA IATE

Caroline Sablayrolle, MC ENSIACET LCA

Non permanents:

Filippo Acerbi, Doct UM2 IATE Fabien Bibi, Doct UM2 IATE

Guillaume Busset, Post-doc ENSIACET LCA Charlotte Loustau Cazalet, Doct. INRA IATE Mia Kurek, Post-doc UM2 IATE

Anaïs Licari, Master INRA IATE Ali Motedayen, Doct., UM2 IATE Célia Mousties, Master UM2 IATE Caroline Wolf, Doct UM2 IATE

(4)

Heterogeneous data reuse

for decision support

(Destercke et al. 2011, Buche et al 2013, Destercke et al. 2013, Tamani et al. 2014, Guillard et al. 2015)

(5)

Food packaging multi-criteria

design support

Scientific advances

Food quality

Consumer safety

Economical competitivity

Environmental protection

I want a packaging which preserves my product, made of renewable resources, but without GMO, if possible transparent and with a « material » cost < 3 € / kg …

(6)

Multi-criteria querying

Fresh produce database Packaging database Virtual MAP simulation Stakeholder preferences and needs Multi-criteria flexible querying Ranked list of most relevant packagings • Industrial constraints • Consumer preferences • Waste management policy • etc

(7)

Life cycle analysis (LCA) permits to estimate environmental impacts using a

complete inventory of matter flow, energy, and effluents generated by the

production process.

6 ECOSPHERE Utilization Resources extraction Elimination Disassembly Construction Material building Recycling Transport Energy production Li fe cy cle TECHNOSPHERE

EMISSIONS (WATER, AIR , SOIL) RESOURCES (RAW MATERIAL, ENERGY)

Flux élémentaires

(8)

Pre-milling Ultrafin milling Physico-chemical treatments Washing and filtration Enzymatic hydrolysis Xylose Glucose Ultrafin milling Ultrafin milling Physico-chemical treatments Physico-chemical treatments Physico-chemical treatments

Washing and filtration Extrusion

1 2 3 4 5 6

Pre-treatment eco-design

(9)

8

Projet IC2ACV – 24 septembre 2014

(10)

Heterogeneous data capitalisation

guided by an ontology

(Touhami et al. 2011, Buche et al. 2011, Berrahou et al. 2013, Touhami et al. 2013, Buche et al. 2013a, Destercke et al. 2013, Buche et al. 2013b, Buche et al. 2014 submitted)

(11)
(12)

11

Web

<html>

Rech document

(WOS, …)

Table and text

extraction

Semantic

annotation

RDF DB Ontology PDF RDB

EcoBioCap

DSS

Web open data

access

Reliability assessment

Data capitalisation

guided by an ontology

(13)

12

Conceptual component (OWL) Terminological component (SKOS)

OTR_Concept T_Concept Dimension Unit_Concept Argument Quantity Symbolic_Co ncept Packaging Temperature Ethylene_Vinyl_Al cohol Thickness Ethylene_Vinyl_Al cohol ‘Ethylene vinyl alcohol’@en skos:prefLabel skos:altLabel ‘EVOH’@en skos:prefLabel ‘Ethylène alcool vinylique’@fr Skos concept Punning ‘EVOH’@fr skos:altLabel Up core Ontology

Down core Ontology

O2Permeability_Re lation UM_Concept Relation Cor e On to lo gy Do mai n On to lo gy

Termino-ontological resource

(14)

A relation concept guarantees

the data reusability

O2Permeability_ Relation Packaging Relative_Humidity Temperature Thickness Diff_Partial_Pressure O2Permeability hasPackaging hasThickness hasTemperature hasDiffPartialPressure hasRelativeHumidity hasO2Permeability

(15)

Wet disk milling Milling rotation speed Biomass quantity Biomass Total pretreatment energy Milling solid qty output hasOutput

Milling solid qty output Relation hasInput hasInput hasInput hasInput Treatment duration hasInput hasInput

Input flow Control parameters

A relation concept models a unit operation

Output flow

(16)

15

- Supprimer le lien d’hiérarchie entre le concept générique Quantity et le

concept partial_pressure.

Exemple : Supprimer la quantité partial_pressure :

- Supprimer les liens avec les propriétés ayant partial_pressure comme domaine.

- Supprimer les liens avec les propriétés ayant partial_pressure comme co-domaine

- Supprimer la terminologie associée à partial_pressure. Partial_pressure Quantity PH AW Skos:Co ncept Pascal Pascal Millibar hasUnitConcept hasUnitConcept hasUnitConcept O2Permeability_ Relation hasAccessConcept Packaging Temperature hasAccessConcept hasAccessConcept

Verrou 1: Maintien de la cohérence de

la RTO quand elle évolue

(17)

Proposition: Notion de kit de changement

(18)

17

Web

<html>

Rech document

(WOS, …)

Table and text

extraction

Semantic

annotation

RDF DB Ontology PDF RDB

EcoBioCap

DSS

Web open data

access

Reliability assessment

Data capitalisation

guided by an ontology

(19)

Nature et source reputation

Publication date

Times cited (WOS)

Source type

Experimentation

management Repetition number  2 categories of meta data :

(20)

Reliability expert quotation

(21)

Reliability computation

(22)

Classes Number of publications Reliability Interval Very reliable 14 [ 4,97 ; 5 ] Reliable 3 [ 3,2 ; 4,85 ] In conflict 9 [ 1,01 ; 4,93] Not at all reliable 4 [ 1,02 ; 1,33 ]

• Very reliable: peer-reviewed articles , with repetitions, high reputation

• Reliable: peer-reviewed articles, with repetitions, medium reputation

• In conflict: peer-reviewed articles but no repetition

• Not at all reliable: technical sheets without external review and no repetition

(23)

22

Web

<html>

Rech document

(WOS, …)

Table and text

extraction

Semantic

annotation

RDF DB Ontology PDF RDB

EcoBioCap

DSS

Web open data

access

Reliability assessment

Data capitalisation

guided by an ontology

(24)

23 21% CO2 O2 0% Packaging

Experiments Published experimental data in scientific

documents

Localisation of relevant information (packaging

characteristics)

{(LDPE, Packaging: (Low Density Polyethylene)),

(50 µm, Thickness: (value: 50, unit concept: Micrometer)),

( 1078 amol x m-1 x s-1 x Pa-1, O2Permeability : (value: 1078, unit concept: Attomole per meter per second per pascal))}

Quantitative data

Enriching an Ontological and

Terminological Resource (OTR)

Extracting experimental data

(25)

24

Reduction of the search space in the text and enrichment of the

termino-ontological resource by adding units of measure

Projet IC2ACV - Séminaire 3BCAR – 30 janvier 2014

Motivation

Automatic extraction from the text of quantitative data

• Water Vapor Permeability (WVP) of 1.81 × 10^-^9 g m^-^1 s^-^1 Pa^-^1

• O[2] permeability increasing from 7.12 to 7.68 × 10^−15 g·(Pa s m)^−1

Scientific locks

 Locate relevant information in the text

 Identify and extract units of measure taking into account their specific syntactical rules

(26)

25

Web

<html>

Rech document

(WOS, …)

Table and text

extraction

Semantic

annotation

RDF DB Ontology PDF RDB

EcoBioCap

DSS

Web open data

access

Reliability assessment

Data capitalisation

guided by an ontology

(27)

26

Concept description and associated guidelines

(28)

27

Data table annotation (pack example)

(29)

28

Projet IC2ACV - Séminaire 3BCAR – 30 janvier 2014

(30)

29

Web

<html>

Rech document

(WOS, …)

Table and text

extraction

Semantic

annotation

RDF DB Ontology PDF RDB

EcoBioCap

DSS

Web open data

access

Reliability assessment

Data capitalisation

guided by an ontology

(31)
(32)

SPARQL querying of the RDF base

(33)

@Web graphical user interface

(34)

• A generic and reusable ontological model to capitalize

experimental data

• Ontology consistency management

• Data reliability assessment

• Manual annotation guided by the ontology using

text-mining assistant

• Flexible querying of annotated data combining 3 kinds of

reasoning (specialization, fuzzy pattern matching,

reliability satisfaction)

(35)

• CSV data file import

• RDF DB consistency management in case of ontology

evolution

• Ontology mapping for Linked Open Data

• Assistant development to facilitate the manual annotation

work

– Text mining approach

– Guidelines formalization using rules (OBDA approach)

• Extending core ontology to represent semantic links between

n-ary relation concepts (temporal links, …)

(36)

Mapping to DOLCE upper ontology

35 Treatment Biomass Treatment expe Quantity Chemical reagent Biomass characterization

(37)

Linked Open Data

36

Which varieties of grapes having a “high” rate of tannin extraction

from marc (vinification co-product) and “good” wine color

parameters ?

Biorefinery RDF DB Vinification RDF DB Biorefinery OWL Ontology (Patrice) Vinification OWL Ontology (Aunur) DOLCE Mapping Instanciation

(38)

Modeling Guidelines associated with processes

as rules

(39)

La gestion de l’évolution

Implémentation

Rim Touhami

(40)

La gestion de l’évolution

Implémentation

Rim Touhami

(41)

Querying the RDF base

(42)

@Web graphical user interface

(43)

42 Projet IC2ACV - Séminaire 3BCAR –2014

Learning corpus

Biorefinery Corpus

Model Set of descriptors

Raw corpus

Learning on Packaging corpus

Identification of new units of measure

New similarity measure

SMDB

Reduction of the search space for unit of measure

Iterative enrichment of the ontology

(44)

43 Projet IC2ACV - Séminaire 3BCAR – 2014

Data

Locate relevant information by learning

Enrichment of the termino-ontological resource

« Corpus Biorefinery »

 243 scientific documents

 Unit terms extracted from the ontology : 36 terms

« Corpus Packaging »

 115 scientific documents

 Unit terms extracted from the ontology : 211 terms Reduction of « Corpus Biorefinery » Reduction of « Corpus Packaging » 86 % 90 %

Enrichment « OTR Bioraffinerie »

+ than 100 %

of new units

Enrichment «OTR Packaging»

18 %

of new units

Figure

Table and text  extraction   Semantic  annotation RDF DB Ontology PDF RDB EcoBioCap DSS
Table and text  extraction   Semantic  annotation RDF DB Ontology PDF RDB EcoBioCap DSS
Table and text  extraction   Semantic  annotation RDF DB Ontology PDF RDB EcoBioCap DSS
Table and text  extraction   Semantic  annotation RDF DB Ontology PDF RDB EcoBioCap DSS
+2

Références

Documents relatifs

The algorithm computes the semantic similarities between WSDL document elements and domain ontology concepts.. Each WSDL document element will be annotated by the nearest

[r]

distinguished in the proposed approach: creating a new ontology without background knowledge (an initial search process with any domain ontologies according to user request)

Abstract. Process templates are stored as valuable resources and then are retrieved and reused in other projects. In order to find a desired template, the semantics of various

A significant principle behind our proposed approach is to consider the user’s context through group profile and annotate the multimedia data automatically by using domain

SEBI incorporates a variety of knowledge infrastructure components and services including image feature extraction [5], semantic web data services [6], linked open data

Using this box, users can quickly and easily enrich selected text with semantic annotations (i.e., DBpedia resources).. The component is written in Java Script and can easily

To support linked data query and analysis, we developed Ontobat, a Semantic Web strategy for automatic generation of linked data RDFs using ontology formats, data