Representing Background Knowledge

Text Summarization Using Ontologies

8.2 Representing Background Knowledge—Ontology

Background knowledge is knowledge that complements the primary target data (the text or text collection or database) that is the subject of the summarization with in-formation that is essential to the understanding of this. Background knowledge can take diﬀerent forms, varying from simple lists of words to formal represen tations.

To provide, in the question answering style, a full natural language

generation-8.2 Representing Background Knowledge—Ontology 165

based summary, a means for reasoning within the domain, as well as a means for processing language expressions is needed. Therefore, background knowledge should include axiomatic formalization of essential domain knowl edge, as well as knowledge to guide the natural-language synthesis process. In this context, how-ever, our goal is conceptual summaries provided as sets of words or concepts, so background knowledge to support this can range from unstruc tured lists of words to ontologies.

A simple list of words can be applied as a filter, mapping from a text to the subset of the word list that appears in the text. Such a controlled list of key words or vocabulary of topics can, by obvious means, be improved to also capture mor-phology by stemming or inflection patterns. For summary pur poses, however, we will have to rely on such course-grained principles as statistics on frequencies to reduce the number of items of a list or to obtain an easy-to-grasp summary. What is needed to obtain significant improvement is a structure that relates in dividual words and thereby supports fusion into commonly related items in the contraction toward suﬃciently brief summaries. In addition to this, the pres ence of relations introduces the element of definition by related items and thus justifies the notion as a structure of concepts rather than a list of words. So taxonomies, partonomies, semantic networks and ontologies are structures that potentially contribute also to knowledge-based summarization. Our main focus here is on ontologies ordered around taxonomic relationship. Rather than the common description-logic-based approach we choose here a simpler concept, al gebraic approach to ontologies.

One important rationale for this is that our goal here is not ontological rea-soning in general, but rather extraction of sets of mapped concepts and manipula-tion of such sets (e.g., contracmanipula-tion). Another reason is that the concept algebraic approach has an inherent and very significant notion of generativity, where the ontology also includes compound concepts that can be formed by means of other concepts.

8.2.1 An Algebraic Approach to Ontologies

Let us consider a basis taxonomy that situates a set of atomic term concepts A in a multiple-inheritance hierarchy. Based on this, we define a generative ontology by generalization of the hierarchy to a lattice and by introducing a (lattice-algebraic) concept language (description language) that defines an extended set of well-formed concepts, including both atomic and compound term concepts.

The concept language used here, ONTOLOG [9], has, as basic elements, con-cepts and binary relations between concon-cepts. The algebra introduces two closed operations sum and product on concept expressions ϕ and ψ, where (ϕ + ψ) denotes the concept, being either ϕ or ψ, and (ϕ × ψ) denotes the concept being ϕ and ψ (also called join and meet, respectively).

Relationships r are introduced algebraically, by means of a binary operator (:), known as the Peirce product (r : ϕ), which combines a relation r with an expres-sion ϕ. The Peirce product is used as a factor in conceptual products, as in x ×(r:y), which can be rewritten to form the feature structure x[r:y], where [r : y] is an attri-bution of the concept x. Thus, we can form compound concepts by attri attri-bution.

Given a set of atomic concepts A and semantic relations R, the set of well-formed terms L is

{ } { [

¹^: ¹^, ^, ⁿ ^: ⁿ

]

^, ⁱ ^, ⁱ

}

L= A ∪ x r y  r y x∈A r ∈R y ∈L (8.1) Compound concepts can thus have multiple as well as nested attributions. For instance, with R = {^WRT, CHR, CBY, TMP, LOC,...}¹ and A = {entity, physi cal entity, abstract entity, location, town, cathedral, old} we get:

{

Obviously modeling ontologies from scratch is the best way to ensure that the result will be correct and consistent. However, for many applications the eﬀort it takes is simply not at disposal and manual modeling has to be restricted to narrow and specific subdomains, while the major part have to be derived from rel evant sources.

Sources that may contribute to the modeling of ontologies may have various forms.

A taxonomy is an obvious choice, and it may be supplemented with, for instance, word and term lists as well as dictionaries for the definition of vo cabularies and for the handling of morphology. Among the obviously useful resources are the Seman-tic Network WordNet [11] and the Unified Medical Language Sys tem (UMLS) [4]

and several other resources in the biomedical science area.

To go from a resource to an ontology is not necessarily straightforward, but if the goal is a generative ontology, and the given resource is a taxonomy, one option is to proceed as follows. Given a taxonomy T over the set of atomic concepts A and a language L, over A for a given set of relations R, being derived as indicated in (8.1). Let ˆT be the transitive closure of T . ˆT can be generalized to an inclusion relation ≤ over all well-formed terms of the language L by the following:

[ ] [ ] [ ] [ ]

1. For with respect to, characterized by, caused by, temporal, location, respectively.

Dans le document Data Mining in Biomedicine Using Ontologies (Page 181-184)