• Aucun résultat trouvé

An ontology-based repository for combining heterogeneous knowledge resources

N/A
N/A
Protected

Academic year: 2022

Partager "An ontology-based repository for combining heterogeneous knowledge resources"

Copied!
245
0
0

Texte intégral

(1)

Thesis

Reference

An ontology-based repository for combining heterogeneous knowledge resources

GHOULA, Nizar

Abstract

Multiple tasks related to documents, such as indexing, retrieving, annotation, or translation are based on linguistic, terminological and ontological knowledge existing in resources of different types represented using various formalisms. Building bridges between these resources and using them together is a complex task. Solving this problem relies on finding the right resources before extracting the required data. Ontology repositories have been created to help in this task by collecting ontologies and offering effective indexing of these resources.

However, these repositories treat a single category of resources and do not provide operations for generating new resources. To meet these needs in terms of knowledge engineering, our contributions are (1) an ontology for representing heterogeneous resources and knowledge combination operators; (2) an approach based on the principles of semantic web to ensure the representation, storage and alignment of heterogeneous resources and (3) the development of an ontology-based repository for combining alignment resources.

GHOULA, Nizar. An ontology-based repository for combining heterogeneous knowledge resources. Thèse de doctorat : Univ. Genève, 2014, no. GSEM 2

URN : urn:nbn:ch:unige-451482

DOI : 10.13097/archive-ouverte/unige:45148

Available at:

http://archive-ouverte.unige.ch/unige:45148

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

for combining heterogeneous knowledge resources

THESIS

presented to the Faculty of Economics and Management of the University of Geneva

by

Nizar Ghoula

Under the direction of

Prof. Gilles Falquet

to obtain the title of

Docteur ès économie et management mention Systèmes d’Information

Jury members:

Dr. Khaled Khelif, Research engineer, Airbus Defence and Space Prof. Giovanna DI MARZO SERUGENDO, Professor, President of the

jury

Dr. Claudine Métral, MER, University of Geneva Dr. Jacques Guyot, Founder, Olanto Foundation

Thesis n2

ISBN 978-2-88903-042-2 Geneva, on December 12th 2014

(3)
(4)

I would like to acknowledge my professor Gilles Falquet who helped me during these years of research and gave me the ability to believe in myself and in the fact that I can go further at each dead end. I am thankful for his way of managing my work, his flexibility, advices and openness. Thank you Gilles for your support and ideas, I have learned a lot from you and I hope that the end of this thesis will lead to a beginning of new collaborations.

I want to express my gratitude to professor Giovanna Di Marzo Serugendo for accepting to review my work and being the head of my PhD committee.

I would like to thank especially Dr. Khaled Khelif for accepting to review my work and also for initiating me in the Semantic Web field at the beginning of my research studies.

I am honored to have Dr. Claudine Métral as a reviewer of my humble contribution and also for being a great and lovely person to work with.

I would like to thank Dr. Jacques Guyot who has been of a great help by giving critical and inspiring point of views on my work.

I am thankful for the support and help of my friend Hélène de Rib- aupierre. Thank you Hélène for your availability for reading my papers and thesis. I also would like to thank all my colleagues who have been helpful and supportive and the administrative staff of the CUI for their availability, help and encouragement, in particular, Marie-France Culebras and Lara Broi.

I would like especially to thank my colleagues Sun Zuchuat-Ji, Gloria Leonie, Nadia Jobin and Anne Dupraz. It was a pleasure working with you.

To my dear friends Mélanie Montagnol, Nathalie Verdon, Yasmina Saïdi, Leif Gröessinger and Jonathan Schad, I am thankful to all the support you have been offering and the encouragements that helped me through rough moments. To my friends Fares Mallouli and his wife Imen Khanfir, thank you for your support and generous attention each time we met. A special dedication to my first computer science teacher Najoua Ben Romdhane who encouraged me through all this long path in this field.

For the Fiechter family and especially Robert, Julia, Eva, Diane, Cyril and Max, I am very thankful for having you and very grateful for your help and amazing support during these past years.

(5)
(6)

To my dear mother who taught me how to read, write and analyze. The brilliant woman who, deprived from perusing her studies, has dedicated her life to educate us and to transmit her thirst for knowledge. To my father who had faith on me, who supported me and taught me the value of time and work. The man who based his existence for the well being of his family.

Words are not enough to say how much I am grateful and proud to have you as parents.

To my dear sisters Kalthoum and Manal, my five brothers, my nine nephews (for the moment), my sisters in law and my whole family who supported me in my decisions and helped me through this long process.

To my dearest Julia and Robert Fiechter, the kindest and most generous parents, thank you for being there for me. You are and always will be as parents to me. You taught me a lot of things and I have spent the most amazing times with you.

To Eva, thank you for being there for me...

(7)
(8)

Many tasks related to documents, such as indexing, retrieving, annota- tion, or translation are based on linguistic, terminological and ontological knowledge existing in resources of different types such as terminologies, glos- saries, ontologies, multilingual dictionaries or text corpora. These resources are represented using various formalisms and languages such as predicate logic, description logic, semantic networks and conceptual graphs, etc. As part of an application that requires the use of external resources, a designer is often required to perform painstaking research and pre-treatment in order to collect and build adequate resources to his application needs. This requires the representation of heterogeneous knowledge resources using specific for- malisms, extracting the required knowledge and design effective large-scale storage structures offering operators for resources management. Resources repositories have been created to help in this task by collecting different re- sources in different formalisms. They generally offer a more effective indexing of these resources than general search engines and generate alignments and annotations to ensure interoperability between resources. However, these repositories treat a single category of resources and do not provide opera- tions for generating new resources.

The aim of this research work is to conceive and design a repository for combining heterogeneous knowledge resources. Such a repository is a col- lection of heterogeneous resources represented by multiple formalisms and offers tools and operators to derive new resources by combining the existing ones. This derivation may involve operations such as selecting a part of a resource, composing it with another one, translating it to another language or representing it in a different formalism. To meet these needs in terms of knowledge engineering and representation, our first contribution is an ontol- ogy for representing heterogeneous resources and knowledge combination op- erators. The representation of these operators supports multiple implemen- tations. Our second contribution is an approach based on the principles of semantic web, metadata and ontologies to facilitate the representation, stor- age and alignment of heterogeneous and multilingual resources. Our third contribution is the development of an ontology-based repository for combin- ing alignment resources. This repository is supported by a set of knowledge engineering operators that composes and aggregate existing alignments gen- erated by different tools. We show in particular that alignment composition can effectively improve the results of ontology matchers.

(9)
(10)

L’extraction et la représentation de connaissances sont des problèmes largement explorés dont une des solutions est basée sur l’utilisation de ressources ontologiques, terminologiques et linguistiques. Ces connaissances existent actuellement sous forme de ressources de différents types tels que les terminologies, les bases de données terminologiques, les glossaires, les on- tologies (générales ou de domaine), les dictionnaires multilingues ou encore les corpus de textes. Ces ressources sont représentées à l’aide de divers for- malismes et langages (logique des prédicats, logique de description, réseaux sémantiques, graphes conceptuels, etc.). Dans le cadre d’une application qui nécessite l’usage d’un certain nombre de ressources externes, un concepteur est souvent amené à effectuer un travail laborieux de recherche et de pré- traitement afin de rassembler et de fabriquer des ressources adéquates aux besoins de ses applications.

Le nombre croissant de ce type de ressources a engendré l’apparition d’entrepôts ou librairies de ressources. Cependant, un nombre limité de ces entrepôts offre une représentation intégrale de plusieurs types de ressources à la fois (ressources de type ontologique, linguistique et terminologique).

De plus, ils ne fournissent pas un ensemble complet d’opérateurs permet- tant la gestion et le traitement de ces ressources. Ainsi, nous avons iden- tifié deux problématiques: (i) les applications demandent de plus en plus de ressources représentées selon des modèles et formalismes différents; (ii) vu qu’il est indispensable de rechercher et d’adapter des ressources de con- naissances hétérogènes, il faut doter les entrepôts de connaissances avec des outils génériques pour adapter ces ressources. Un tel entrepôt est donc une collection de ressources hétérogènes représentées par différents formalismes qui offre des outils pour dériver de nouvelles ressources à partir de la combi- naison des ressources existantes.

Nous proposons une approche pour la modélisation et la construction d’un entrepôt de ressources. L’objectif principal de cette approche est de con- cevoir un entrepôt de ressources de connaissances pour stocker des ressources hétérogènes et dériver de nouvelles ressources à partir de la combinaison des ressources existantes. Ceci est modélisé et piloté par une ontologie générique qui formalise les modèles de représentation de ressources et d’opérateurs de gestion et de combinaison de ressources de connaissances. Nous prenons en considération la possibilité de combiner ces opérateurs afin de modéliser des processus complexes tels que l’intégration, l’annotation et l’alignement.

(11)
(12)

1 Introduction 1

1.1 Scientific context and research problem . . . 1

1.2 Research areas . . . 4

1.3 Proposed research methodology . . . 4

1.4 Restrictions for the research plan . . . 5

1.5 Contributions . . . 6

1.6 Impacts and applications of the contributions . . . 7

1.7 Thesis plan . . . 8

2 Knowledge representation and repositories for managing knowledge resources 11 2.1 Knowledge and knowledge representation . . . 11

2.1.1 Knowledge . . . 12

2.1.2 Knowledge representation . . . 13

2.1.3 Knowledge representation formalisms . . . 15

2.2 knowledge resources repositories . . . 16

2.2.1 Repositories for indexing and retrieving knowledge re- sources . . . 17

2.2.2 Repositories for collecting and managing knowledge re- sources . . . 18

2.3 Discussion . . . 19

I Resources representation and combination approach 23 3 Identification of knowledge resources 25 3.1 Definitions and typology of knowledge resources . . . 26

3.1.1 Knowledge resources . . . 26

3.1.2 Resources represented using formal ontology languages 27 3.1.3 Terminological, Lexical and semantic resources . . . . 30

3.1.4 Linguistic resources . . . 33

3.2 Models and representation approaches for heterogeneous knowledge . . . 35

3.2.1 Metadata representation models . . . 35

3.2.2 Specific representation models . . . 38

3.2.3 Generic representation models . . . 40

3.3 A high level classification of knowledge resources . . . 41

3.3.1 Autonomous resources . . . 41

(13)

3.3.2 Enrichment resources . . . 42

3.3.2.1 Index terms . . . 42

3.3.2.2 Annotations . . . 42

3.3.2.3 Alignments . . . 43

3.3.3 Combined Resources . . . 43

3.4 Discussion . . . 43

4 T OK: A meta-model for representing heterogeneous knowl- edge resources 45 4.1 Resources representation aspects for designing the resources model . . . 46

4.2 Resources representation model: TOK_Onto . . . 48

4.2.1 Metadata representation . . . 48

4.2.2 Resources content representation model . . . 50

4.2.2.1 Node Entity . . . 53

4.2.2.2 Link Entity . . . 54

4.2.2.3 Expression Entity . . . 54

4.2.2.4 Describing content representation models . . 55

4.2.3 The modeling approach ofTOK_Onto . . . 56

4.2.4 Example of using the model to represent WordNet . . 59

4.3 Representing resources management . . . 60

4.3.1 Resources engineering operators representation . . . . 61

4.3.2 Process monitoring representation . . . 62

4.3.3 Resources evolution-tracking . . . 63

4.4 Use case scenario . . . 64

4.5 Discussion . . . 66

5 A Taxonomy of resources combination operators 69 5.1 Resources management and combination operators . . . 69

5.1.1 Representation operators . . . 71

5.1.1.1 Abstraction . . . 71

5.1.1.2 Reification . . . 72

5.1.1.3 Resources translation (from a model to an- other) . . . 73

5.1.2 Enrichment operators . . . 75

5.1.2.1 Alignement . . . 75

5.1.2.2 Annotation . . . 76

5.1.3 Derivation and combination operators . . . 77

5.1.3.1 Selection and derivation . . . 78

5.1.3.2 Composition . . . 78

5.1.3.3 Aggregation . . . 79

(14)

5.2 Usage of the model and operators to create repository for com-

bining terminological resources . . . 81

5.2.1 Storing resources representations . . . 81

5.2.1.1 Generating a lexical ontology from wikipedia 85 5.2.1.2 Enriching english WordNet with lexical forms in other languages . . . 87

5.2.2 Alignment of representation formalisms . . . 89

5.3 Conclusion . . . 92

II Application of the T OK approach on alignment re- sources 95 6 Refining T OK Model with a generic model for representing alignment resources 97 6.1 Introduction . . . 97

6.2 Definitions and typology of alignments . . . 98

6.2.1 Definition of alignments . . . 98

6.2.2 Types of alignments . . . 99

6.3 Formalisms for representing alignments . . . 100

6.4 T OKAlign: a generic model for representing alignments . . . . 102

6.5 Importing alignment resources using T OKAlign model . . . . 106

6.5.1 Transforming alignments . . . 106

6.5.2 Importing and exporting alignments . . . 108

6.6 Discussion . . . 112

7 Operators for combining and aggregating heterogeneous alignment resources 113 7.1 Approaches for alignment resources reuse . . . 114

7.1.1 Approaches reusing existing alignments . . . 114

7.1.2 Approaches proposing theories for alignment composi- tion . . . 116

7.2 An approach for alignment resources combination . . . 118

7.2.1 Framework of representing alignment correspondences 119 7.2.2 Interpretation of correspondences using fuzzy set theory120 7.2.2.1 Interpretation of alignments as sets of fuzzy relations . . . 121

7.2.2.2 Interpretation of alignment relations as fuzzy sets . . . 121

7.2.3 Interpretation for Dempster-Shafer theory . . . 123

7.2.4 Switching from an interpretation to another . . . 124

7.3 Alignment combination operators . . . 125

(15)

7.3.1 Alignment composition . . . 125

7.3.1.1 Composing correspondences . . . 125

7.3.1.2 Composing Alignments . . . 128

7.3.2 Alignment aggregation . . . 129

7.3.2.1 Aggregating conflicting correspondences us- ing Dempster-Shafer theory of combination . 129 7.3.2.2 Aggregating conflicting correspondences us- ing fuzzy sets theory . . . 131

7.3.2.3 Reducing correspondences that contain mul- tiple relations . . . 132

7.3.3 Alignment union . . . 133

7.3.4 Alignment intersection . . . 133

7.3.5 Alignment difference . . . 134

7.4 Implementing alignment combination and management oper- ators . . . 135

7.4.1 Implementing fuzzy aggregators . . . 136

7.4.2 Executing combination operators . . . 137

7.4.3 Alignments overview, update and edition . . . 140

7.4.4 Discussion about the aggregation metrics . . . 142

7.5 Discussion . . . 142

8 Evaluation of alignment resources combination operators 145 8.1 Evaluation methodology . . . 146

8.1.1 Building a test corpus . . . 146

8.1.2 Computing precision and recall measures . . . 147

8.1.3 Evaluation of combination and aggregation operators . 148 8.2 Experimentation and results . . . 149

8.2.1 Alignment union evaluation results . . . 151

8.2.2 Alignment intersection evaluation results . . . 153

8.2.3 Alignment composition evaluation results . . . 155

8.2.3.1 Composition of validated alignments . . . 155

8.2.3.2 Composition of alignments from the same tool 157 8.3 Usage of alignment composition to enrich existing alignments 159 8.4 An approach for enhancing composition using the content of the resources . . . 160

8.4.1 Extending composition path finding using the content of a common resource . . . 162

8.4.2 Composition path finding using an alignment exten- sion operator . . . 164

8.5 Conclusion and discussion . . . 166

(16)

9 Conclusion and future work 167

9.1 Advantages of theT OK approach . . . 167

9.2 Limitations and future work with regards to the contributions 170 9.3 Use of the methodology for research and industry . . . 171

A About some uses cases of the repository 173 A.1 Enriching an ontology with a bilingual glossary . . . 173

A.2 Importing the resources . . . 174

B The TOK ontology 179 B.1 Potential usage of the TOK ontology . . . 179

B.2 Classes . . . 181

B.3 Object properties . . . 195

B.4 Data properties . . . 197

Bibliography 199

(17)
(18)

1.1 A repository of heterogeneous knowledge resources. . . 3 1.2 Methodology for creating a model for representing and a

repository for managing knowledge resources. . . 5 1.3 Application and impacts of the proposed methodology. . . 8 2.1 Data-Information-Knowledge model according to

[Fahey & Prusak 1998] . . . 12 2.2 The semiotic triangle of [Ogden & Richards 1927] . . . 13 2.3 An overview of semantic web languages according to

[Stephan et al.2007] . . . 15 2.4 Architecture of Watson (“a gateway for the Semantic Web”)

as described in [d’Aquinet al. 2011] . . . 17 2.5 State of the LOD cloud on “2014-08-30” . . . 21 3.1 Steps for designing a meta-model for representing knowledge

resources . . . 26 3.2 Types of ontological resources according to

[Giunchiglia & Zaihrayeu 2009] adopted from [Uschold & Gruninger 2004] . . . 27 3.3 Semantic Web languages stack . . . 30 3.4 OMV: ontology metadata vocabulary [Raúl et al.2006] . . . . 36 3.5 NoRMV: Non ontological resources’ metadata vocabulary

[Villazón-Terrazas et al.2010a] . . . 37 3.6 Ontopath, a model for representing ontologies

[Jiménez-Ruizet al. 2007] . . . 38 3.7 Terminological entities meta-model from

[Vandenbussche & Charlet 2009] . . . 39 3.8 PROTON ontology model for OWLIM [Kiryakov et al.2005] 39 3.9 The semiotic triangle in LMM from [Piccaet al. 2008] . . . . 40 3.10 Keys for knowledge resources categorisation . . . 41 4.1 Lifecycle of a TOK resources within the repository . . . 46 4.2 Resources representation aspects and interactions . . . 47 4.3 From a formalism to its representation language and syntax . 49 4.4 Excerpt of the metadata representation model of knowledge

resources (T OKM eta) . . . 51 4.5 Excerpt of the content representation model of knowledge re-

sources (T OKCont) . . . 52

(19)

4.6 Representation of a resource with its metadata and different

representations of its content . . . 55

4.7 Description of a representation model . . . 56

4.8 Representation approach for knowledge resources using mul- tiple models . . . 57

4.9 MOF as an approach for resources representation . . . 58

4.10 Semantic Markup for Web Services ontology modules [Bursteinet al. 2004] . . . 60

4.11 OWLS profiles representation [Burstein et al. 2004] . . . 62

4.12 Usage of the repository and the ontologyTok_Onto . . . 65

4.13 Illustration of the Concept_Hierarchy model fromTok_Onto 66 5.1 Interactions model between the resources and the operators within the repository . . . 70

5.2 Resources representation and derivation operators . . . 71

5.3 Classes of resources translation operators . . . 74

5.4 An approach for reusing “Non-Ontological” resources [Villazón-Terrazas et al.2010a] . . . 75

5.5 Default representation model for annotation resources . . . . 77

5.6 Aggregating (Aggr) two views of resources represented with the same model; this operation gives as a result a new resource represented in the same model and two sets of alignments (A31 andA32) with the original resources . . . 80

5.7 Representation of the WordNet-Like model . . . 84

5.8 The list of resources within the repository based on their rep- resentation model . . . 85

5.9 Excerpt of the modelW P_Likerepresenting Wikipedia articles 86 5.10 Browsing the concepts and terms extracted from Wikipedia . 87 5.11 Operators involved in the WordNet enrichment process . . . . 88

5.12 Alignment detection by similarity . . . 88

5.13 Representation and alignments of entities within the lightweight repository . . . 89

5.14 Representation and alignments of entities within the lightweight repository . . . 89

5.15 Using TOK model to combine and represent annotated corpora 90 5.16 Alignment of annotation models . . . 91

6.1 Formalisms for representing alignment resources . . . 100

6.2 Generic model for a representing alignments . . . 105

6.3 Architecture of the resources’ import component . . . 109

6.4 Excerpt of mappings between alignment formalisms and the generic alignment model . . . 109

(20)

6.5 AllegroGraph’s Architecture . . . 110

6.6 Importing an alignment between two biomedical ontologies . . 111

7.1 Illustrating alignment relations as fuzzy relations . . . 121

7.2 Illustrating alignment relations as fuzzy sets . . . 122

7.3 Composition of two alignments . . . 127

7.4 Multiple paths for alignment composition . . . 127

7.5 Architecture of the alignment repository . . . 136

7.6 Operations by alignments interface . . . 138

7.7 Operations by resources interface . . . 139

8.1 Importing alignments for testing . . . 150

8.2 Classic precision and recall measures of the alignment result- ing from the Union aggregator Using FS and D-S theories . . 151

8.3 Advanced precision and recall measures of disjunctive fuzzy aggregations and Dempster-Shafer aggregation . . . 153

8.4 Classic precision and recall measures of conjunctive fuzzy ag- gregations and Dempster-Shafer aggregation . . . 154

8.5 Advanced precision and recall measures of conjunctive fuzzy aggregations and Dempster-Shafer aggregation . . . 155

8.6 Classic precision and recall measures for composition followed by fuzzy aggregations or Dempster-Shafer aggregation . . . . 156

8.7 Advanced precision and recall measures for composition fol- lowed by disjunctive fuzzy aggregations or Dempster-Shafer aggregator . . . 157

8.8 Advanced precision and recall of the composition of align- ments from the same tool using Dempster-Shafer and Fuzzy set aggregators . . . 158

8.9 Enriching direct alignments using composed alignments of in- termediary resources . . . 160

8.10 Composition path finding using the content of resources . . . 161

B.1 Resources generation using the proposed approach. . . 179

(21)
(22)

3.1 Classification of “non-ontological” resources in the literature . 31 4.1 Examples of resource content representation models and their

principal components . . . 56 7.1 Composition table for logical relations as defined by

[Euzenat 2008] . . . 126

(23)
(24)

Introduction

Contents

1.1 Scientific context and research problem . . . . 1 1.2 Research areas . . . . 4 1.3 Proposed research methodology . . . . 4 1.4 Restrictions for the research plan . . . . 5 1.5 Contributions . . . . 6 1.6 Impacts and applications of the contributions . . . . 7 1.7 Thesis plan . . . . 8

1.1 Scientific context and research problem

In the knowledge engineering field, scientists try to solve problems based on reusing existing knowledge resources [Hendler & Golbeck 2008] to adapt them for advanced tasks such as information retrieval, conceptual indexing, knowledge extraction from text, service discovery and matching, semantic search, as well as advanced annotation or translation. Two questions are important to answer in order to define an approach of reusing knowledge resources:

• Where can we find knowledge resources?

Repositories and libraries have been created to help collecting mul- tiple linguistic, terminological and ontological resources represented within different formalisms. For instance, for ontological resources, advanced repositories offer the possibility to generate alignments and annotations to ensure interoperability between them. For example, Swoogle1 indexes approximately 10,000 ontologies; DAML repository2 provides search based on ontology components (classes, properties, . . . ) or metadata (URI, funding source, . . . ); BioPortal3 has similar

1http://swoogle.umbc.edu

2http://www.daml.org/ontologies

3http://bioportal.bioontology.org

(25)

searching and browsing features [Noy et al.2008] and offers the possi- bility to annotate and align different ontologies. Many other portals such as Watson [Sabouet al.2007b] or repositories such as OWLIM [Kiryakovet al. 2005] offer access to index, store and manage ontolog- ical resources.

• How to combine knowledge resources?

To reuse a resource for a specific task, it needs to be adapted. Its adaptation requires operations such as selecting a part of it, composing it with another one, translating it to another language or representing it in a different formalism. For this purpose, it is necessary to have access to a set of resources management operators that can be composed to generate adapted (or personalized) knowledge resources (the basic bricks for specifying the production of new knowledge resources).

These resources libraries are each restricted to collecting a specific cate- gory of resources (only ontologies, terminologies or linguistic resources (ACL4 or META-NET5)). The frontiers between knowledge resources are not clear enough in terms of applications [Garshol 2004] even though many research studies have proposed formal definitions to categorize and identify types of knowledge resources [Guarino 1997, Gilchrist 2003]. Consequently, a reposi- tory that can cope with heterogeneous types of knowledge resources can be useful along with the existing repositories. Hence, models for representing these resources are required for using them together. In general, there is a need for more than one resource to perform knowledge engineering tasks, then it’s important to have repositories offering access to a more rich set of knowledge resources represented within multiple formalisms (see figure B.1).

A required feature for resources repositories (or libraries) is the support of heterogeneous representations of knowledge and the diversity of knowledge resources. What we propose as a solution is a knowledge engineering sys- tem that is able to represent, and store heterogeneous knowledge resources, align them and offer operators for combining their content. This requires considering knowledge resources regarding different aspects:

• Resources representation aspect: Knowledge resources exist under dif- ferent formats and languages (predicate logic, description logic, seman- tic networks, conceptual graphs, natural language, etc.). This diversity in knowledge representation and the semantics supporting each repre- sentation approach makes it difficult to define or use a unique approach to represent and store these resources.

4http://www.aclweb.org

5http://www.meta-net.eu

(26)

• Resources retrieval aspect: Finding linguistic, terminological and onto- logical knowledge resources is not a simple task, it is generally difficult to find the required resources for a specific process. Some knowledge re- sources repositories have been created to offer a more effective indexing for these resources than common search engines. The representation of the resources metadata and collecting information about the usage of these resources is a key to a better indexing and retrieval of them.

• Resources management aspect: Multiple tools and methodologies for collecting, combining and reusing knowledge are proposed and many surveys collected descriptions and specifications of different knowl- edge engineering approaches [Mårtensson 2000, Shvaikoet al. 2006, Scharlet al. 2012, Liao 2003]. However, few models for representing these tools and classifying them have been proposed [Schreiber 2000, Wielingaet al. 1992] which makes it difficult to represent and share information about knowledge resources engineering.

Resources  Repository  

User  Need   Opera'ons   Representa'on  

Figure 1.1: A repository of heterogeneous knowledge resources.

The aim of this research project is to build a repository of knowledge resources. This repository is a collection of hetero- geneous resources represented by multiple formalisms or models and allows a user to generate new resources by means of simple or complex operators.

(27)

1.2 Research areas

The quality of a solution to the heterogeneity problem within knowledge repositories requires to (1) represent heterogeneous resources and organize their content using a common vocabulary and representation approach (2) define a set of generic operators for the management and the combination of these resources. To build a resources repository that satisfies the require- ments, we identify two main problems to solve:

1. Is there a model that can unify heterogeneous models of knowledge resources?

Since there exist many different (and incompatible) ways to express knowledge in resources, it is hard to devise a single representation model for their content. Moreover, the same resource may be involved in processes that support a specific model. For instance, an existing ontology alignment service may only support as input OWL ontologies, while another service might require terminologies represented with the SKOS formalism. The same is true for other processes like automated text annotation, multilingual text alignment, word sense disambigua- tion, etc.

2. What operators can we use for combining resources?

Knowledge management tasks are defined by means of a sequence of ab- stract operators, for example to build a search application the first step is creating indexes of knowledge resources which is itself a process re- alized over different steps such as tagging, named entity identification, etc. Therefore, there is a need to propose a model for representing knowledge engineering tasks and develop a set of subsequent opera- tors. The definition of these operators depends on the treatment of the knowledge within resources. Each operator can be implemented in several ways depending on the types of its parameters.

1.3 Proposed research methodology

Since there are two sides of the problem, related to resources representations and knowledge engineering operators, we have identified the following steps for our research:

1. Study the diversity of approaches for representing knowledge resources and identify the types of knowledge resources that will be considered.

Define a model for creating a common representation of these resources.

The representation approach should lead to building a storage facility

(28)

that collects knowledge from heterogeneous resources. The represen- tation model is intended to be generic in order to allow representing alignments or matchings between entities of heterogeneous knowledge resources.

2. Define an approach to represent and integrate different implementa- tions of knowledge engineering operators that are intended to manage and combine all kinds of knowledge resources that are stored and rep- resented within the repository.

3. Implement some instances of the defined operators in order to manage a specific kind of resources. For instance alignment resources are con- sidered our use case since they are heterogeneous and are represented in different formalisms. These resources are of a great importance to demonstrate the usage of our methodology.

1. Formal representation: building a model for knowledge resources representation and management

2. Importing and storing knowledge resources and define operators for managing and combining knowledge from heterogeneous resources

3. Implementing Knowledge engineering operators and create a repository for managing and combining heterogeneous alignment resources as an application

Models

Knowledge Base Corpora of knowledge

resources Knowledge

resources study

Figure 1.2: Methodology for creating a model for representing and a reposi- tory for managing knowledge resources.

1.4 Restrictions for the research plan

The notion of knowledge resources is quite ambiguous, this is why we con- sider as knowledge resource every resource that represents some general (high level) knowledge about a specific domain, as opposed to data and facts usu- ally represented in databases, spreadsheets, etc., (databases may be used to

(29)

store general knowledge (e.g., Wikipedia or patent datasets) but it is not their primary objective). The hypotheses of our research are the following:

1. Types of resources: the resources that we represent, manage and com- bine are supposed to be of a certain level of expressivity and containing knowledge that is linked using relations.

2. Resources transformation and import: functions for transforming re- sources from their original formalism to another formalism or to the common representation model are not intended to be neither generic nor exhaustive and are not a requirement for the achievement of the research goals. Some of these tools will be implemented depending on the needs for the experimentations.

1.5 Contributions

Knowledge representation, engineering and management is a wide research field with an ascending number of innovative approaches. Working on the identified research problems led to two categories of contributions, having each multiple specific contributions:

C0 A methodology for representing heterogeneous knowledge resources, knowledge engineering abstracts and designing a repository for com- bining heterogeneous knowledge resources:

C01 we defined a categorization of knowledge resources based on the generic aspects such as autonomy, type of content, schemas, etc.

[Ghoulaet al. 2010c];

C02 we proposed a methodology for resources representation and cre- ated an upper level model to represent the common and formal aspects of knowledge resources. The representation approach con- sists on considering three dimensions of knowledge representa- tion (conceptual knowledge, terminological knowledge and lexical knowledge) and different levels of expressivity (meta-level, schema level, resources level) [Ghoulaet al. 2010b];

C03 we created common models for representing generic entities and relations for some categories of the identified resources. These models were integrated within the proposed resources model (The- sauri entities, ontology entities, etc.); We implemented some knowledge engineering operators within a use case of merging mul- tiple ontological and terminological resources in order to create an enriched version of WordNet [Ghoulaet al.2010a, Ghoula 2012].

(30)

C04 we proposed an approach to represent knowledge engineer- ing operators and proposed a taxonomy of resources manage- ment and combination operators. We created different cate- gories of knowledge engineering operators and we defined new operators. We also created a library of knowledge engineering operators based on the existing operators from the literature [Ghoula et al.2011, Ajmiet al. 2012, Ghoula & Falquet 2012].

C1 An approach for designing concrete operators for managing and combin- ing heterogeneous alignment resources:

C11 we categorized alignment resources and designed a generic model for representing and storing alignments [Ghoulaet al. 2013];

C12 we defined an operator for composing alignment resources using uncertainty theories [Ghoula et al.2013, Ghoula et al.2014];

C13 we defined an operator for aggregating alignment resources gen- erated by different matchers using a fuzzy theory and an evidence combination theory [Ghoulaet al. 2014];

C14 we created a repository for alignment resources based on the alignment model and implemented the proposed operators within an API. The description of these operators was supported by the operators model [Ghoula et al. 2014];

C15 we applied the proposed operators in the case of ontology match- ing and proposed an evaluation methodology for testing their im- plementation [Ghoula et al.2014];

C16 we proposed extra tools for enhancing the alignment composition in order to enrich existing alignments.

The innovation of our methodology is the possibility to build a concrete original repository that allows users to manage and combine their resources or resources within the repository using their operators or operators from the library of the repository.

1.6 Impacts and applications of the contributions

The current state of the art about knowledge resources combination and engineering lacks organization and formalization. Tools are being built and used without being described and shared efficiently. For adapting knowledge resources, users need a system that plays the role of a framework that imports their resources and combine them using multiple built-in or external tools.

(31)

The contributions of this work are of a great use for research and industry;

it proposes the basic elements that support a library of tools for knowledge engineering. The proposed ontology offers the possibility to integrate knowl- edge resources representation. This insures different levels of interoperability and a dynamic representation of knowledge resources. The representation of knowledge resources operators is a support for building algebra for combining and composing these operators. Some research issues have been addressed and solved in terms of resources representation and combination. In terms of usage for research, our contributions offer the ground for a potential open repository where researchers can share their experiences (tools and processes) and their resources (derived, adapted and validated).

TOK LAB Resources Library Operators

Library

Resources Operators

Figure 1.3: Application and impacts of the proposed methodology.

The proposed approach is a candidate for an industrial application. A system can be proposed as a laboratory of knowledge resources combina- tion based on commercial or open-source tools that derive knowledge from existing public or private resources.

1.7 Thesis plan

The thesis proceeds as follows:

• Chapter 2 describes the background knowledge about knowledge rep- resentation and resources repositories;

• Chapter 3 provides definitions for the knowledge resources that are considered in this research and describes the aspects of resources repre- sentations (contributionC0: C01) while presenting our categorization of knowledge resources and discussing the state of the art of resources representation models.

(32)

• Chapter 4 presents our approach of resources representation by de- scribing an upper level model for representing heterogeneous knowledge resources (contributionsC0: C02).

• Chapter 5 proposes a taxonomy of knowledge resources combination and management operators and describes some examples of concrete applications of our approach for representing and combining heteroge- neous knowledge resources (contributionsC0: C03-C04).

• Chapter 6 presents the first part of an application of our approach by defining a generic model for representing alignment resources as an extension of the proposed model. In this chapter we describe the transformation of existing alignment representations (based on differ- ent formalisms) into our generic alignment model (contributions C1:

C11-C14).

• Chapter 7 details the second part of the application of our methodol- ogy by defining and implementing a framework of interpretation and combination of heterogeneous alignment resources (contributions C1:

C12-C13).

• Chapter 8 is dedicated to the evaluation of the application of our methodology by testing the usefulness our alignment combination op- erators. In this chapter we describe a proposal to enhance the compo- sition operator by exploiting the content of resources and discusses the possibility of enriching existing alignments using the composition and aggregation (contributionsC1: C14-C15-C16).

• Chapter 9 provides a discussion about the contributions and a descrip- tion of some future work.

(33)
(34)

Knowledge representation and repositories for managing knowledge resources

Contents

2.1 Knowledge and knowledge representation . . . . 11 2.1.1 Knowledge . . . . 12 2.1.2 Knowledge representation . . . . 13 2.1.3 Knowledge representation formalisms . . . . 15 2.2 knowledge resources repositories . . . . 16

2.2.1 Repositories for indexing and retrieving knowledge re- sources . . . . 17 2.2.2 Repositories for collecting and managing knowledge re-

sources . . . . 18 2.3 Discussion . . . . 19

In this chapter we present an overview of the notions related to the prob- lem of representing and managing heterogeneous knowledge resources. We start by describing the notion of knowledge and discuss some consensus about knowledge representation. Then, in the context of knowledge engineering we explore particularly knowledge repositories and knowledge artifacts (re- sources) that can be represented and managed within repositories.

2.1 Knowledge and knowledge representation

Without launching a debate about the definition of “Knowledge”, which is a subject supporting multiple visions and philosophical theories [Moser 1998], we intend to define the characteristics of this notion that are related to knowledge organization and representation [Hjerland 2003].

(35)

2.1.1 Knowledge

From an information technology (IT) perspective, the definition of knowl- edge relies on the distinction between data (referred to as syntactic entities), information (defined as interpreted data) and knowledge (defined as learned information) [Aamodt & Nygård 1995]. There are different visions about the relationships between data, information and knowledge, the most common one is that knowledge is created based on information that are extracted from data (the model: data to information to knowledge) [Ackoff 2010].

In a review about knowledge management, [Alavi & Leidner 2001] dis- cussed this vision of knowledge management in IT and relied on the argu- ments of [Tuomi 1999] to stipulate that the data to information to knowledge model should be interpreted as knowledge to information to data. The ex- planation of this vision is that “knowledge must exist before information can be formulated and before data can be measured from information”.

[Fahey & Prusak 1998] proposes a model where knowledge is used to elabo- rate information, interpret data and learn new knowledge (see figure 2.1).

Figure 2.1: Data-Information-Knowledge model according to [Fahey & Prusak 1998]

According to empirical studies [De Jong & Ferguson-Hessler 1996]

knowledge can be divided into multiple types: situational, conceptual (propo- sitional/declarative), procedural and strategic. The majority of studies about knowledge representation focus on two types, which are declarative knowledge (“know what”) and procedural knowledge (“know how”).

In the context of our research, our concern is about conceptual (or declar- ative) knowledge which is defined by [De Jong & Ferguson-Hessler 1996] as

“static knowledge about facts, concepts and principles that apply within

(36)

a certain domain”. This definition of declarative knowledge does not in- volve the representation aspect, which is important for its usage. Declar- ative knowledge is the kind of knowledge that is related to the description of “Things” using a mental representation of its characteristics, associated belief, status and related knowledge.

2.1.2 Knowledge representation

Knowledge representation allows to express, represent, store, reason about and exchange knowledge. In the context of declarative knowledge, knowl- edge representation relies on a symbolic unit which is the concept. Multiple interpretations from different perspectives can be attributed to the concept [Margolis & Laurence 2014].

The concepts can be seen as (1) mental representations (entities to repre- sent internal propositional attitudes in the mind), (2) as abilities (“abilities that are peculiar to cognitive agents” [Margolis & Laurence 2014]), or (3) as abstract objects (which play the role of “constituents of propositions”

[Margolis & Laurence 2014] that “mediate between thought and language, on the one hand, and referents, on the other” [Margolis & Laurence 2014]).

To represent knowledge, we adopt the third interpretation (abstract objects) which is also the vision of Gottlob Frege1 as detailed in [Zalta 2014].

Figure 2.2: The semiotic triangle of [Ogden & Richards 1927]

We consider a concept as a key element for the declarative knowledge representation. A concept is as an abstract object that brings sense to the natural language representation and refers to a referent (this is also described in [Ogden & Richards 1927] see figure 2.2).

1http://fr.wikipedia.org/wiki/Gottlob_Frege

(37)

Thus, the concept becomes an entity of knowledge representation de- scribed (or expressed) generally using terms (as labels). For instance, the representation of such an entity within ontologies can be primitive (simple concepts) or using a composition of primitive concepts to represent a defined concept. This composition depends on the expressivity of the representation formalism.

Each concept within a knowledge representation formalism has a defined number ofattributesand is connected to other concepts throughRelations are used to represent links such as subsumption (the kind of relations between concepts is defined by the knowledge representation formalism).

According to [Stephanet al. 2007], knowledge representation “studies the formalization of knowledge and its processing within machines”. From a perspective of Artificial Intelligence, a knowledge representation approach defines a machine-readable and machine-interpretable representation of a domain of interest. For instance, an ontology is a knowledge representa- tion artifact that defines a vocabulary of domain terms and constraints their meaning by indicating how concepts denoted by these terms areinter- relatedwithin a specific domain structure.

To clarify the differences in the definition and usage of ontologies in com- puter science and information systems, [Hepp 2008] identified three points of disagreement about the definition of this knowledge representation artifact and its fundamental properties:

• “Truth vs consensus”: this point reveals the disagreement between a view of ontologies as models of “true” reality that are independent from context and a view of ontologies as a representation of consensual shared human judgment;

• “Formal logic vs. other modalities”: this point reveals the disagreement about the knowledge representation formalisms that are considered as a fundamental to qualify a resource as an ontology. [Hepp 2008] argued about the importance of formal logic as a modality for ontologies;

• “Specification vs. conceptual system”: this point discusses the dis- agreement about whether an ontology is considered as the conceptual system (by being an abstraction of a domain’s conceptual elements and their relations) or a specification of a conceptual system (by being the explicit specification of this abstraction using a representation formal- ism). [Hepp 2008] pointed out that it is more popular to consider an ontology as a specification of the conceptual system and represent it as a machine-readable artifact.

[Hepp 2008] stated that the nature of these disagreements are not of a terminological aspect (which term to use to qualify the concept of ontologies)

(38)

but the disagreement is originated from different visions. For instance, In computer science, the vision is that conceptual entities within ontologies are mainly defined by formal means. In information systems, the concern is more about understanding the conceptual elements and their relationships than the means of specifications. This statement will be used in the identification of knowledge resources types (discussed in the section 3.1 of the following chapter).

2.1.3 Knowledge representation formalisms

Knowledge representation formalisms are the mean for creating machine- readable artifacts representing knowledge of a specific domain. The concrete representation of these formalisms is ensured using representation languages.

Thus, the syntax of representation languages is defined by a formal grammar (e.g., XML, RDF, OWL). In general, the syntax of knowledge representation languages is close to the entity-relation model which can be easily represented as a graph. 3 Knowledge Representation and Ontologies 65

classicalsemantics

RDF(S) OWL-Lite

OWL-DL

OWL-Full

DLP First-Order Predicate Logic

WSML-DL

WSML-Core

WSML-Flight F-Logic (LP) WSML-Rule

Datalog SWRL

LP semantics

decidable undecidable

WSML-Full

more expressiveless expressive

DL-Safe Rules

semantically embedded approximately sem. emb.

syntactically embedded

Fig. 3.4.An overview of Semantic Web languages

former can also be expressed in the latter by means of a direct mapping of languages con- structs. A dashed arrow denotes a weaker form of embedding, where not all the features of the less expressive language do completely fit the more expressive target language, mean- ing that the former is in principle (approximately) covered by the latter apart from moderate deficiencies in some language constructs and their semantic interpretation. A dash-dotted arrow denotes a syntactic embedding such that the language constructs of the (syntacti- cally) less expressive language can be directly used in the more expressive one, although they may semantically be interpreted in a different way.

An early initiative to standardise a language for semantic annotation of web resources by the World Wide Web consortium (W3C) resulted inRDFandRDFS, which form now a well established and widely accepted standard for encoding meta data. TheRDF(S) language is described in more detail in Section 3.4.2. It can be used to express class- membership of resources and subsumption between classes but its peculiar semantics does neither fit the classical nor the LP-style. If semantically restricted to a first-order setting, RDF(S) can be mapped to a formalism named description logic programs (DLP) [18], that is sometimes used to interoperate between DL and LP by reducing expressiveness to their intersection.

On top of RDF(S), W3C standardisation efforts have produced theOWLfamily of lan- guages for describing ontologies in the Semantic Web, which comes in several flavours with increasing expressiveness. Only the most expressive language variant, namelyOWL-Full,

Figure 2.3: An overview of semantic web languages according to [Stephan et al. 2007]

As we described in the previous section, we use the notion of concept as a constituent of expressions to represent declarative knowledge. The meaning (sense) of an expression is subjective and can be ambiguous if the sense of the symbols used to represent it is ambiguous. Thus the sense of symbols and their combination must be defined using a formal language. This is why a formal semantics is required to explicit the meaning of symbols and their semantic relations (subsumption, deductions, etc.). The semantics of repre-

(39)

sentation formalisms is expressed using a declarative mathematical language such as predicate logic or description logic.

[Stephan et al. 2007] proposed a categorized description and a survey about logic-based knowledge representation formalisms and languages (see figure 2.3). These types of formalisms reproduce parts of the human reason- ing process based on the notion of logical consequence. [Shadboltet al. 2006]

argue that the success of the Semantic Web is based on the success of cre- ating standards for expressing shared meaning. Thus, knowledge represen- tation is a requirement for knowledge sharing and engineering. Knowledge Engineering is a field of Artificial Intelligence focused on modeling, extract- ing, representing, storing and reusing knowledge. Knowledge acquisition and reuse is based on reasoning about existing knowledge.

2.2 knowledge resources repositories

One characteristic that represents at the same time a strength and a weakness of the Semantic Web is the heterogeneity. The diversity of knowledge repre- sentation formalisms and the diversity of knowledge models enrich knowledge engineering by reflecting a side of the real world (different domains, diverse point of views, different cognitive models, etc.). This is counted as a positive aspect when knowledge engineers can define and create their own models for knowledge representations and design applications to extract and generate knowledge according to specific models.

However, when it comes to the principle of knowledge sharing and com- munication between software agents, specific conditions require to be fulfilled.

Communication and sharing knowledge between agents requires two levels of interoperability:

• structural and syntactic interoperability: knowledge within the seman- tic web and knowledge engineering contexts is machine-readable and models are provided to exchange structured data;

• semantic interoperability: the semantic aspect of shared knowledge requires adjustment, so that agents can reason about a shared knowl- edge without being confronted to inconsistency and ambiguity. The semantic interoperability requires a common reference model and a well-defined semantics.

Formalisms and standards for representing and sharing knowledge al- low representing and storing knowledge, within resources, having different types (ontologies, dictionaries, thesaurus, etc.). Consequently, reasoning with shared knowledge requires systems to organize and index these knowl-

(40)

edge resources for easier access. Thus multiple systems for indexing and storing knowledge resources have been created.

2.2.1 Repositories for indexing and retrieving knowledge re- sources

The increasing number of ontological resources on the web became problem- atic. On one hand, resources representing the same concepts are created independently, which leads often to resources that are too customized to be reused and generally application dependent. On the other hand, search en- gines and information retrieval models needed to be adapted to this kind of resources on the Web (see figure 2.4).

Figure 2.4: Architecture of Watson (“a gateway for the Semantic Web”) as described in [d’Aquin et al.2011]

Consequently, new search applications that automatically discover and index Semantic Web documents and answer queries about this kind of re- sources have been developed in the past decade. Swoogle [Finin et al.2005], Watson [d’Aquin et al. 2011], and OntoSelect [Buitelaaret al.2004] are some of the most popular Semantic Web resources repositories of this kind.

These repositories are designed based on the following features:

• Categorizing the semantic richness of semantic data to use it for ranking purposes;

• Representing relations between resources such as “import” or references to provide semantic clustering and navigation of resources;

• Providing access mechanisms and interfaces for software agents and human users;

(41)

These repositories solve one side of the problem, which is the aware- ness, and sharing of Semantic Web documents and ontological resources.

Thus, the models for resources representation do not consider the content of resources and do not solve semantic heterogeneity issues. Consequently, another type of repositories such as ontology libraries have been created to offer a centralized hosted approach for collecting knowledge resources.

2.2.2 Repositories for collecting and managing knowledge re- sources

The second type of resources repositories includes the systems that do not discover automatically ontological or knowledge resources on the web.

These systems rely on registered users that upload and maintain their resources. This allows collecting and storing resources from different sources and offer services of exploring and sharing knowledge. In the cat- egory of ontology repositories of this kind (ontology libraries) multiple sys- tems were developed such as BioPortal [Noyet al.2008], DAML Ontol- ogy Library2, TONES Ontology Repository3, Semantic Web infrastructure [Baclawski & Schneider 2009]. Other types of repositories offer access to lan- guage resources such as TerminoTrad4, etc.

[d’Aquin & Noy 2012] provided a survey of ontology libraries. The au- thors defined a set of features to evaluate their usefulness by reviewing eleven ontology libraries. The criteria that were identified in this survey are:

• Purpose and coverage: each ontology library serves a set of purposes that are related to ontology development and sharing. Some ontology libraries index and collect ontologies from a specific domain;

• Library content: This feature involves the criteria of the type of proce- dure for collecting ontologies (manual, hybrid or automatic), the type of gatekeeping which is related to the validation of the submitted re- sources (manual or automatic), the metadata of ontologies and other key elements about the characteristics and types of content (mappings and relations between ontologies);

• Main functions for users: Ontology libraries are evaluated according to the main services they offer to the user. The basic functions that ontology libraries provide include search, browsing, selecting and evalu- ating ontologies. Some systems offer programmatic access to ontologies through web services and APIs.

2http://www.daml.org/ontologies/

3http://rpc295.cs.man.ac.uk:8080/repository/

4http://terminotrad.com

(42)

• Other features: this category represent all the extra features that are not considered as basic features for ontology libraries.

This survey represents a first study of a set of ontology libraries. The main contribution is the categories of features that are defined to evaluate and compare these libraries. This kind of repositories is evolving and becom- ing more and more accurate for an effective sharing and reuse of ontological resources.

Another survey tackles the concept of ontology repositories from another perspective. [Heymans et al.2008] study a set of ontology repositories that store ontological resources based on their storage schemes. There are two cat- egories of ontology repositories which are native and database-based stores.

Native stores use the file system as storage mechanisms. This method has the advantage of supporting a large quantity of data. This type of storage is more popular thanks to its effectiveness in terms of loading and to its open- ness to possibilities of optimization. Allegrograph5, Jena TDB6, sesame7 and OWLIM [Kiryakov et al.2005] are ones of the most popular native stores.

Database-based stores use database management systems such as MySQL, PostgreSQL or Oracle. This model is less performant than the native storage for load and update actions but offers more advantages:

• benefit from the use of database systems such as query optimization mechanisms, transactions, persistence, access control, etc.;

• access knowledge within ontologies and other datasets within different databases. RDF queries can be translated into SQL queries which can be integrated within other SQL queries that retrieve data from other sources.

Multiple benchmarks are proposed in order to evaluate RDF triple store technologies8. Some ontology repositories are considered more relevant than others depending on their capabilities of performing reasoning and inference.

Evaluating RDF stores is somehow controversial since their performance de- pends on multiple parameters such as the hardware, the cache mechanisms, order of triples within queries, etc.

2.3 Discussion

Since the semantic Web is qualified as the Web of data, the paradigm of Linking Open Data (LOD) [Bizeret al. 2009] was proposed as a solution for

5http://franz.com/agraph/allegrograph/

6http://jena.apache.org/documentation/tdb/

7http://www.openrdf.org

8http://www.w3.org/wiki/RdfStoreBenchmarking

(43)

large scale integration of data on the Web (see figure 2.5). This is a more practical solution than Semantic Web resources indexing and crawling. This approach is a solution for sharing and exchanging instances of knowledge from of different origins. For instance, DBPedia [Auer et al.2007] is an example of a large linked dataset that represents, publishes and links to other resources the content of Wikipedia.

The scope of our research is not about developing another ontology repos- itory similar to what currently exists in the state of the art. Our main ob- jective is to go beyond collecting ontological resources to considering their content and to offering extra services than managing ontologies. The scope of our research is between knowledge resources libraries and linked data. We intend to represent knowledge within heterogeneous resources (not only on- tologies) and access the content of these resources to offer operators that generate new elements of knowledge by reusing this content.

There are multiple propositions of technologies for storage mechanisms that can be used to store knowledge resources. Our intention is not to propose a new design or model to create a native or database-based store.

We assume that the existing solutions are useful to effectively store knowledge and that the performances of triple stores or ontology repositories are already good enough to be used as a support for the resources repository that we intend to design.

A repository containing heterogeneous types of knowledge resources is needed. Hence, multiple models and formalisms for representing these re- sources are required. For this purpose, it is necessary to develop a set of knowledge resources operators that can import, export and process these re- sources while keeping a trace of their origin (the provenance of the resources, for example externally imported or generated from the combination of other resources).

In the next chapter, we focus on defining the kinds of knowledge resources that we will consider. We also investigate the state of the art about the existing models for representing heterogeneous knowledge and we discuss the aspects that we consider for representing and combining heterogeneous knowledge resources.

(44)

Figure 2.5: State of the LOD cloud on “2014-08-30”

source by: http://lod-cloud.net

(45)
(46)

Resources representation and

combination approach

(47)
(48)

Identification of knowledge resources

Contents

3.1 Definitions and typology of knowledge resources . . . 26 3.1.1 Knowledge resources . . . . 26 3.1.2 Resources represented using formal ontology languages 27 3.1.3 Terminological, Lexical and semantic resources . . . . 30 3.1.4 Linguistic resources . . . . 33 3.2 Models and representation approaches for hetero-

geneous knowledge . . . . 35 3.2.1 Metadata representation models . . . . 35 3.2.2 Specific representation models . . . . 38 3.2.3 Generic representation models . . . . 40 3.3 A high level classification of knowledge resources . . 41 3.3.1 Autonomous resources . . . . 41 3.3.2 Enrichment resources . . . . 42 3.3.3 Combined Resources . . . . 43 3.4 Discussion . . . . 43

In this chapter we state the hypotheses about the kind of knowledge we consider and about the content we aim to represent within resources. We also represent a state of the art about resources classification and generic models for representing heterogeneous knowledge. The resources that we consider are represented in different formalisms and represent declarative knowledge using formal or semi-formal representations. These types of resources repre- sent the domains of human activity and describe them using different sorts of entities that might be related to each other. This criterion makes them candidates for matching, integration and further knowledge management op- erations. In order to design a generic model to represent heterogeneous re- sources we defined the following steps as key elements for the model’s design (see figure 3.1).

(49)

Identifying knowledge resources. (Section 3.1)

Classify the selected

resources. (Section 3.2) Explore existing resources models. (Section 3.3)

Usage of the proposed

model. (chapter 5) Propose a common resources model. (Chapter 4)

Figure 3.1: Steps for designing a meta-model for representing knowledge resources

3.1 Definitions and typology of knowledge re- sources

Some research works explore the organizational aspect of knowledge resources [Holsapple & Joshi 2001], other works define procedures for reusing knowl- edge resources [Markus 2001] and many other research methodologies define knowledge resources based on specific use cases or for representing procedural knowledge about applications.

3.1.1 Knowledge resources

For our research methodology, we consider resources that represent some general (high level) knowledge about a domain, as opposed to specific facts.

Formal representation is always interconnected with lexical representations;

for instance, formal ontologies use vocabularies for identifying concepts, re- lations, individuals, or other entities.

In fact, using natural language is one way to connect a formal represen- tation to the reality it represents. In less formal resources such as glossaries and encyclopedias, natural language is the only way to describe concepts and other entities. Even in formalized resources, natural language appears in the description of logical formulae, classes, relations, etc.

Definition 1 (Knowledge Resource) We define a knowledge resource as a named resource representing some knowledge of a domain and having a creation origin, content and a usage purpose. The content is represented using a knowledge representation formalism that has a specific semantics.

Many research studies have pointed out the importance of knowledge re- sources and defined their properties. The main characteristics of a knowledge

Références

Documents relatifs

[4] describe the various requirements on an ontology versioning framework that are useful to create different versions of internal ontologies: Identification (a

As a contribution to this area of research, this paper proposes: an on- tology for computing devices and their history, the first version of a knowledge graph for this field,

In this paper we present arkivo , an ontology designed to accommodate the archival description, supporting archive workers by encompassing both the hierarchical structure of

By using a representation of directed graphs in terms of first-order logical structures, we defined organiza- tional knowledge as integrated relevant information about

3.4 Temporal Arguments as Extra Arguments So far, our approach has argued for a direct encoding of the temporal extent through two further arguments, turning a binary relation, such

Depending on the user’s needs, a resource in the repository can be represented differently using multiple languages, each language uses a subset of the resource’s entities and link

The definition of our Privacy ontology involves modeling concepts from several knowledge sources related to the problem of data privacy accountability, such as a set of

Each business domain is described by a dedicated model, conform to a dedicated meta-model, and is manipulated by actors with specific roles: (a) Software Architect: