Thesis plan - An ontology-based repository for combining heterogeneous knowledge resources

The thesis proceeds as follows:

• Chapter 2 describes the background knowledge about knowledge rep-resentation and resources repositories;

• Chapter 3 provides definitions for the knowledge resources that are considered in this research and describes the aspects of resources repre-sentations (contributionC0: C01) while presenting our categorization of knowledge resources and discussing the state of the art of resources representation models.

• Chapter 4 presents our approach of resources representation by de-scribing an upper level model for representing heterogeneous knowledge resources (contributionsC0: C02).

• Chapter 5 proposes a taxonomy of knowledge resources combination and management operators and describes some examples of concrete applications of our approach for representing and combining heteroge-neous knowledge resources (contributionsC0: C03-C04).

• Chapter 6 presents the first part of an application of our approach by defining a generic model for representing alignment resources as an extension of the proposed model. In this chapter we describe the transformation of existing alignment representations (based on differ-ent formalisms) into our generic alignmdiffer-ent model (contributions C1:

C11-C14).

• Chapter 7 details the second part of the application of our methodol-ogy by defining and implementing a framework of interpretation and combination of heterogeneous alignment resources (contributions C1:

C12-C13).

• Chapter 8 is dedicated to the evaluation of the application of our methodology by testing the usefulness our alignment combination op-erators. In this chapter we describe a proposal to enhance the compo-sition operator by exploiting the content of resources and discusses the possibility of enriching existing alignments using the composition and aggregation (contributionsC1: C14-C15-C16).

• Chapter 9 provides a discussion about the contributions and a descrip-tion of some future work.

Knowledge representation and repositories for managing knowledge resources

Contents

2.1 Knowledge and knowledge representation . . . . 11 2.1.1 Knowledge . . . . 12 2.1.2 Knowledge representation . . . . 13 2.1.3 Knowledge representation formalisms . . . . 15 2.2 knowledge resources repositories . . . . 16

2.2.1 Repositories for indexing and retrieving knowledge re-sources . . . . 17 2.2.2 Repositories for collecting and managing knowledge

re-sources . . . . 18 2.3 Discussion . . . . 19

In this chapter we present an overview of the notions related to the prob-lem of representing and managing heterogeneous knowledge resources. We start by describing the notion of knowledge and discuss some consensus about knowledge representation. Then, in the context of knowledge engineering we explore particularly knowledge repositories and knowledge artifacts (re-sources) that can be represented and managed within repositories.

2.1 Knowledge and knowledge representation

Without launching a debate about the definition of “Knowledge”, which is a subject supporting multiple visions and philosophical theories [Moser 1998], we intend to define the characteristics of this notion that are related to knowledge organization and representation [Hjerland 2003].

2.1.1 Knowledge

From an information technology (IT) perspective, the definition of knowl-edge relies on the distinction between data (referred to as syntactic entities), information (defined as interpreted data) and knowledge (defined as learned information) [Aamodt & Nygård 1995]. There are different visions about the relationships between data, information and knowledge, the most common one is that knowledge is created based on information that are extracted from data (the model: data to information to knowledge) [Ackoff 2010].

In a review about knowledge management, [Alavi & Leidner 2001] dis-cussed this vision of knowledge management in IT and relied on the argu-ments of [Tuomi 1999] to stipulate that the data to information to knowledge model should be interpreted as knowledge to information to data. The ex-planation of this vision is that “knowledge must exist before information can be formulated and before data can be measured from information”.

[Fahey & Prusak 1998] proposes a model where knowledge is used to elabo-rate information, interpret data and learn new knowledge (see figure 2.1).

Figure 2.1: Data-Information-Knowledge model according to [Fahey & Prusak 1998]

According to empirical studies [De Jong & Ferguson-Hessler 1996]

knowledge can be divided into multiple types: situational, conceptual (propo-sitional/declarative), procedural and strategic. The majority of studies about knowledge representation focus on two types, which are declarative knowledge (“know what”) and procedural knowledge (“know how”).

In the context of our research, our concern is about conceptual (or declar-ative) knowledge which is defined by [De Jong & Ferguson-Hessler 1996] as

“static knowledge about facts, concepts and principles that apply within

a certain domain”. This definition of declarative knowledge does not in-volve the representation aspect, which is important for its usage. Declar-ative knowledge is the kind of knowledge that is related to the description of “Things” using a mental representation of its characteristics, associated belief, status and related knowledge.

2.1.2 Knowledge representation

Knowledge representation allows to express, represent, store, reason about and exchange knowledge. In the context of declarative knowledge, knowl-edge representation relies on a symbolic unit which is the concept. Multiple interpretations from different perspectives can be attributed to the concept [Margolis & Laurence 2014].

The concepts can be seen as (1) mental representations (entities to repre-sent internal propositional attitudes in the mind), (2) as abilities (“abilities that are peculiar to cognitive agents” [Margolis & Laurence 2014]), or (3) as abstract objects (which play the role of “constituents of propositions”

[Margolis & Laurence 2014] that “mediate between thought and language, on the one hand, and referents, on the other” [Margolis & Laurence 2014]).

To represent knowledge, we adopt the third interpretation (abstract objects) which is also the vision of Gottlob Frege¹ as detailed in [Zalta 2014].

Figure 2.2: The semiotic triangle of [Ogden & Richards 1927]

We consider a concept as a key element for the declarative knowledge representation. A concept is as an abstract object that brings sense to the natural language representation and refers to a referent (this is also described in [Ogden & Richards 1927] see figure 2.2).

1http://fr.wikipedia.org/wiki/Gottlob_Frege

Thus, the concept becomes an entity of knowledge representation de-scribed (or expressed) generally using terms (as labels). For instance, the representation of such an entity within ontologies can be primitive (simple concepts) or using a composition of primitive concepts to represent a defined concept. This composition depends on the expressivity of the representation formalism.

Each concept within a knowledge representation formalism has a defined number ofattributesand is connected to other concepts throughRelations are used to represent links such as subsumption (the kind of relations between concepts is defined by the knowledge representation formalism).

According to [Stephanet al. 2007], knowledge representation “studies the formalization of knowledge and its processing within machines”. From a perspective of Artificial Intelligence, a knowledge representation approach defines a machine-readable and machine-interpretable representation of a domain of interest. For instance, an ontology is a knowledge representa-tion artifact that defines a vocabulary of domain terms and constraints their meaning by indicating how concepts denoted by these terms are inter-relatedwithin a specific domain structure.

To clarify the differences in the definition and usage of ontologies in com-puter science and information systems, [Hepp 2008] identified three points of disagreement about the definition of this knowledge representation artifact and its fundamental properties:

• “Truth vs consensus”: this point reveals the disagreement between a view of ontologies as models of “true” reality that are independent from context and a view of ontologies as a representation of consensual shared human judgment;

• “Formal logic vs. other modalities”: this point reveals the disagreement about the knowledge representation formalisms that are considered as a fundamental to qualify a resource as an ontology. [Hepp 2008] argued about the importance of formal logic as a modality for ontologies;

• “Specification vs. conceptual system”: this point discusses the dis-agreement about whether an ontology is considered as the conceptual system (by being an abstraction of a domain’s conceptual elements and their relations) or a specification of a conceptual system (by being the explicit specification of this abstraction using a representation formal-ism). [Hepp 2008] pointed out that it is more popular to consider an ontology as a specification of the conceptual system and represent it as a machine-readable artifact.

[Hepp 2008] stated that the nature of these disagreements are not of a terminological aspect (which term to use to qualify the concept of ontologies)

but the disagreement is originated from different visions. For instance, In computer science, the vision is that conceptual entities within ontologies are mainly defined by formal means. In information systems, the concern is more about understanding the conceptual elements and their relationships than the means of specifications. This statement will be used in the identification of knowledge resources types (discussed in the section 3.1 of the following chapter).

2.1.3 Knowledge representation formalisms

Knowledge representation formalisms are the mean for creating machine-readable artifacts representing knowledge of a specific domain. The concrete representation of these formalisms is ensured using representation languages.

Thus, the syntax of representation languages is defined by a formal grammar (e.g., XML, RDF, OWL). In general, the syntax of knowledge representation languages is close to the entity-relation model which can be easily represented as a graph. 3 Knowledge Representation and Ontologies 65

classicalsemantics

Fig. 3.4.An overview of Semantic Web languages

former can also be expressed in the latter by means of a direct mapping of languages con-structs. A dashed arrow denotes a weaker form of embedding, where not all the features of the less expressive language do completely fit the more expressive target language, mean-ing that the former is in principle (approximately) covered by the latter apart from moderate deficiencies in some language constructs and their semantic interpretation. A dash-dotted arrow denotes a syntactic embedding such that the language constructs of the (syntacti-cally) less expressive language can be directly used in the more expressive one, although they may semantically be interpreted in a different way.

An early initiative to standardise a language for semantic annotation of web resources by the World Wide Web consortium (W3C) resulted inRDFandRDFS, which form now a well established and widely accepted standard for encoding meta data. TheRDF(S) language is described in more detail in Section 3.4.2. It can be used to express class-membership of resources and subsumption between classes but its peculiar semantics does neither fit the classical nor the LP-style. If semantically restricted to a first-order setting, RDF(S) can be mapped to a formalism named description logic programs (DLP) [18], that is sometimes used to interoperate between DL and LP by reducing expressiveness to their intersection.

On top of RDF(S), W3C standardisation efforts have produced theOWLfamily of lan-guages for describing ontologies in the Semantic Web, which comes in several flavours with increasing expressiveness. Only the most expressive language variant, namelyOWL-Full,

Figure 2.3: An overview of semantic web languages according to [Stephan et al. 2007]

As we described in the previous section, we use the notion of concept as a constituent of expressions to represent declarative knowledge. The meaning (sense) of an expression is subjective and can be ambiguous if the sense of the symbols used to represent it is ambiguous. Thus the sense of symbols and their combination must be defined using a formal language. This is why a formal semantics is required to explicit the meaning of symbols and their semantic relations (subsumption, deductions, etc.). The semantics of

repre-sentation formalisms is expressed using a declarative mathematical language such as predicate logic or description logic.

[Stephan et al. 2007] proposed a categorized description and a survey about logic-based knowledge representation formalisms and languages (see figure 2.3). These types of formalisms reproduce parts of the human reason-ing process based on the notion of logical consequence. [Shadboltet al. 2006]

argue that the success of the Semantic Web is based on the success of cre-ating standards for expressing shared meaning. Thus, knowledge represen-tation is a requirement for knowledge sharing and engineering. Knowledge Engineering is a field of Artificial Intelligence focused on modeling, extract-ing, representextract-ing, storing and reusing knowledge. Knowledge acquisition and reuse is based on reasoning about existing knowledge.

Dans le document An ontology-based repository for combining heterogeneous knowledge resources (Page 31-39)