Forgetting and uniform interpolation in extensions of the description logic EL

(1)

Forgetting and uniform interpolation in extensions of the description logic EL

Boris Konev, Dirk Walther, and Frank Wolter^? University of Liverpool, UK

{konev, dwalther, wolter}@liverpool.ac.uk

1 Introduction

Terminologies used in life sciences, health-care, and other knowledge intensive ar- eas are often very large and comprehensive. Examples are the medical terminology SNOMED CT (Systematised Nomenclature of Medicine Clinical Terms) [17] containing about380 000 concept definitions and the National Cancer Institute ontology (NCI) [16] containing more than60 000axioms. For terminologiesT of this size, it is often of interest toforgeta subvocabularyΣof the vocabulary ofT; i.e., to transformT into a new terminologyTΣ(called aΣ-interpolant ofT) that contains no symbols from Σand that is indistinguishable fromT regarding its consequences that do not useΣ. In AI, this problem has been studied under a variety of names such asforgettingandvari- able elimination[14, 4, 10]. In mathematical logic, this problem has been investigated as theuniform interpolation problem[18]. ComputingΣ-interpolants of terminologies has a number of potential applications, e.g.,

Re-use of ontologies:when using terminologies such as SNOMED CT in an application, often only a very small fraction of its vocabulary is of interest. In this case, one could use aΣ-interpolant instead of the whole terminology, whereΣis the vocabulary not of interest for the application.

Predicate hiding: a terminology developer might not want to publish a terminology completely because a certain part of its vocabulary is not intended for public use. Again, publishing Σ-interpolants, where Σ is the vocabulary to be hidden, appears to be a solution to this problem.

Exhibiting hidden relations between terms:large terminologies are difficult to maintain as small changes to its axioms can have drastic and damaging effects. To analyze possibly unwanted consequences over a certain partΓ of the vocabulary, an ontology developer can automatically generate a complete axiomatization of the relations between terms overΓ by computing aΣ-interpolant, whereΣis the complement ofΓ. Ontology versioning: to check whether two versions of a terminology have the same consequences over their common vocabulary (or a subset thereof), one can first compute their interpolants by forgetting the vocabulary not shared by the two versions and then check whether the two interpolants are logically equivalent (i.e., have the same models).

In the description ofΣ-interpolants given above, we have neither specified a language in which they are axiomatized nor did we specify the language wrt. whichΣ- interpolants should be indistinguishable from the original termininology. The choice of

?This research was supported by the EPSRC grant EP/E065279/1.

(2)

the latter language depends on the application: for example, if one is interested ininclu- sions between concepts, then aΣ-interpolant should imply the same concept inclusions using no symbols fromΣas the original terminology. On the other hand, if the terminology is used to query instance data usingconjunctive queries, then aΣ-interpolant together with any instance data using no symbols fromΣshould imply the same certain answers to conjunctive queries using no symbols fromΣas the original terminology.

Regarding the languageLin whichΣ-interpolants should be axiomatized, one has to find a compromise between the following three conflicting goals:

(R) Standard reasoning problems (e.g., logical equivalence) inLshould not be more complex than reasoning in the language underlying the terminology.

(I)Σ-interpolants inLshould be uniquely determined up to logical equivalence: ifT₁⁰ andT₂⁰ areΣ-interpolants inLof terminologiesT1andT2that have the same consequences not usingΣ, thenT₁⁰andT₂⁰should be logically equivalent.

(E) The language Lshould be powerful enough to admitfiniteandsuccinct(ideally, polynomial size) axiomatizations ofΣ-interpolants, and it should be possible to com- puteΣ-interpolants efficiently (ideally, in polynomial time).

For terminologies in standard description logics (DLs) such asELand any language betweenALC andSHIQO, there do not exist languagesLachieving all these goals simultaneously.¹ To illustrate this point, letL be second-order logic. ThenLtrivially satisfies (E) but fails to satisfy (R) and (I), for ontologies in any standard DL.

In this paper, we consider forgetting in the lightweight description logicELunder- lying the designated OWL2-EL profile of the upcoming OWL Version 2 extended with role inclusions and domain and range restrictions [1]. This choice it motivated by the fact that forgetting appears to be of particular interest for large-scale and comprehensive terminologies and that many such terminologies are given in this language. We introduce three DLs for axiomatizingΣ-interpolants satisfying criteria (R) and (I) and preserving, respectively, inclusions between concepts, concept instances, and answers to conjunctive queries. These DLs do not satisfy (E), asΣ-interpolants sometimes do not exist or are of exponential size. We demonstrate that, nevertheless,Σ-interpolants typically exist and can be computed in practice for large-scale terminologies such as SNOMED CT and appropriate versions of NCI. This work has been presented in [7];

detailed proofs and additional experiments are available in a technical report [8].

2 Preliminaries

LetN_C,N_R, andN_Ibe countably infinite and mutually disjoint sets of concept names, role names, and individual names.EL-conceptsCare built according to the rule

C:= A | > | CuD | ∃r.C,

whereA∈NC,r∈NR, andC, Drange overEL-concepts. The set ofELH^r-inclusions consists ofconcept inclusionsCvDandconcept equationsC≡D,domain restric- tionsdom(r) v C,range restrictionsran(r) v Candrole inclusionsr v s, where

1This follows from the fact that deciding whether TBoxes in these DLs imply the same concept inclusions over a signature is at least one exponential harder than deciding logical equivalence [12, 11].

(3)

ConceptC TranslationC^] ConceptC TranslationC^]

> x=x dom(r) ∃y(r(x, y))

A A(x) ran(r) ∃y(r(y, x))

CuD C^](x)∧D^](x) ∃u.C (x=x)∧ ∃y C^](y)

∃r.C ∃y(r(x, y)∧C^](y)) ∃r1u · · · urn.C ∃y(r1(x, y)∧ · · · ∧rn(x, y)∧C^](y))

Fig. 1.Standard translation·^]

C, D are EL-concepts andr, s ∈ N_R. AnELH^r-TBox T is a finite set ofELH^r- inclusions. AnELH^r-TBox is calledELH^r-terminologyif all its concept inclusions and equations are of the formA vC andA ≡Cand no concept name occurs more than once on the left hand side. In what follows, we useA ./ C to denote expressions of the formAvCandA≡C.

Assertions of the formA(a)andr(a, b), wherea, b ∈ NI,A ∈ NC, andr ∈ NR, are called ABox-assertions. An ABox is a finite set of ABox-assertions. By obj(A) we denote the set of individual names inA. A knowledge base (KB) is a pair(T,A) consisting of a TBoxT and an ABoxA. Assertions of the formC(a)andr(a, b), where a, b ∈ N_I,CaEL-concept, andr ∈ N_R, are called instance assertions. To define the semantics of DLs considered in this paper we make use of the fact that DL-expressions can be regarded as formulas inFO, whereFOdenotes the set of first-order predicate logic formulas with equality using unary predicates inNC, binary predicates inNR, and constants fromNI; see Figure 1 (in which the DL-constructors not considered so far are defined later). In what follows, we will not distinguish between DL-expressions and their translation intoFOand regard TBoxes, ABoxes and KBs as finite subsets ofFO.

Thus, we useT |=ϕto denote thatϕfollows fromT in first-order logic even ifϕis an ELH^r-inclusion andT a subset ofFOand similar conventions apply to DLs introduced later in this paper.FO(and, therefore,ELH^r) is interpreted in modelsI = (∆Î,·Î), where thedomain∆Î is a non-empty set, and·Î is a function mapping each concept nameAto a subsetAÎ of∆Î, each role namerto a binary relationrÎ ⊆∆Î×∆Î, and each individual nameato an elementaÎ ∈∆Î.

The most important ways of queryingELH^r-TBoxes and KBs are subsumption (check whetherT |= αfor anELH^r-inclusionα), instance checking (check whether (T,A) |= αfor an instance assertionα), and conjunctive query answering. To define the latter, call a first-order formula q(x)aconjunctive query if it is of the form

∃yψ(x,y), whereψis a conjunction of expressionsA(t)andr(t1, t2)witht, t1, t2

drawn fromNIand sequences of variablesxandy. Ifxhas lengthk, then a sequence aof elements ofobj(A)of lengthkis called acertain answertoq(x)of a KB(T,A) if(T,A)|=q(a).

3 Forgetting

A signature Σ is a subset ofNC ∪NR2. Given a signature Σ, we set Σ = (NC ∪ N_R)\Σ. Given a concept, role, concept inclusion, TBox, ABox,FO-sentence, set of

2We investigate forgetting for TBoxes for DLs without nominals; thus we do not include individual names into the signature.

(4)

FO-sentencesE, we denote bysig(E)the signature ofE, that is, the set of concept and role names occurring in it. We use the termELH^r_Σ-inclusion (Σ-ABox,Σ-query,L_Σ- sentence, etc.) to denoteELH^r-inclusions (ABoxes, queries,L-sentences, etc.) whose signature is contained inΣ.

To define forgetting, we first formalize the notion of inseparability between TBoxes wrt. a signature. Intuitively, two TBoxesT1andT2are inseparable wrt. a signatureΣif they have the sameΣ-consequences, where the set ofΣ-consequences considered can either reflect subsumption queries, instance queries, or conjunctive queries, depending on the application. We give the definitions for sets ofFO-sentences because we later require these notions for a variety of DLs.

Definition 1. LetT1andT2be sets ofFO-sentences andΣa signature.

–T1andT2are conceptΣ-inseparable, in symbolsT1≡^C_ΣT2, if for allELH^r_Σ-inclusions α:T1|=α⇔ T2|=α.

–T1 andT2 are instanceΣ-inseparable, in symbolsT1 ≡ⁱ_Σ T2, if for allΣ-ABoxes AandΣ-instance assertionsαusing individual names fromobj(A):(T1,A)|=α⇔ (T2,A)|=α.

–T1 andT2 are queryΣ-inseparable, in symbolsT1 ≡^q_Σ T2, if for all Σ-ABoxesA, conjunctiveΣ-queriesq(x), and vectorsaof the same length asxof individual names inobj(A):(T1,A)|=q(a)⇔(T2,A)|=q(a).

The definition of forgetting (Σ-interpolants) is now straightforward.

Definition 2 (Σ-interpolant).LetT be anELH^r-TBox,Σa finite signature, andL a set ofFO-sentences. IfTΣ is a finite set of L_Σ-sentences such thatT |= ϕfor all ϕ∈ TΣ, thenTΣis

– a conceptΣ-interpolant ofT inLifT ≡^C

Σ TΣ; – an instanceΣ-interpolant ofT inLifT ≡ⁱ_Σ TΣ; – a queryΣ-interpolant ofT inLifT ≡^q_ΣTΣ.

One can show that every query Σ-interpolant is an instanceΣ-interpolant and every instance Σ-interpolant is a concept Σ-interpolant. The converse implications do not hold, even forELH^r-terminologies:

Example 1. LetT ={ran(r)vA1,ran(s)vA2, B≡A1uA2}andΣ={A1, A2}.

One can show that the empty TBox is a concept Σ-interpolant ofT. However, the empty TBox is not an instanceΣ-interpolant ofT. To show this, consider theΣ-ABox A = {r(a0, b), s(a₁, b)}. Then(T,A) |= B(b)but(∅,A) 6|= B(b). Observe thatno ELH^r-TBox (and even noSHQ-TBox) is an instanceΣ-interpolant ofT because it is impossible to capture the ABoxAin a DL in which one cannot refer to the range of distinct roles in one concept. On the other hand, the TBoxT⁰={ran(r)uran(s)vB}

given in an extension ofELH^ris an instanceΣ-interpolant ofT.

Example 2. LetT ={A v ∃s.>, sv r₁, svr₂}andΣ ={s}. ThenT⁰ ={Av

∃r1.> u ∃r2.>}is an instanceΣ-interpolant ofT, butT⁰is not a queryΣ-interpolant ofT. To show the latter, letA ={A(a)}and letq = ∃x(r1(a, x)∧r2(a, x)). Then (T,A)|=qbut(T⁰,A)6|=q. Again, noELH^r-TBox (and even no TBox inSHIQ) is

(5)

a queryΣ-interpolant ofT. On the other hand, the TBoxT⁰⁰={Av ∃r₁ur₂.>}given in an extension ofELH^rwith conjunctions of roles names is a queryΣ-interpolant of T.

Besides of exhibiting examples where concept, instance, and queryΣ-interpolants are distinct, Example 1 and 2 also show that even in extremely simple casesELH^rand a variety of more expressive DLs are not sufficiently powerful to express instance and query Σ-interpolants ofELH^r-terminologies. Rather surprisingly, there also exist simple examples in whichELH^r-TBoxes are not sufficiently expressive to axiomatize concept Σ-interpolants ofELH^r-terminologies.

Example 3. LetΣ={Research Inst,Education Inst}andT be University≡Research InstuEducation Inst

SchoolvEducation Inst ran(PhD from)vResearch Inst

Then there does not exist anELH^r-TBox that is a conceptΣ-interpolant ofT. Intu- itively, the reason is that there is noELH^r_Σ-TBox which follows fromT and has the following infinite set ofΣ-consequences (which are consequences ofT):

∃PhD from.(SchooluA)v ∃PhD from.(UniversityuA),

where A ∈ Σ. On the other hand, the TBox T⁰ = {ran(PhD from)uSchool v University}given in an extension ofELH^ris a conceptΣ-interpolant ofT.

We now introduce three extensions ofELH^rwhich we propose to axiomatize concept, instance, and queryΣ-interpolants.

Definition 3 (EL^ran,0,EL^ran,EL^ran,u,u).C^ran,0-concepts are constructed using the following syntax rule

C:= D | ran(r) | ran(r)uD,

whereDranges overEL-concepts andr ∈N_R. The set ofEL^ran,0-inclusions consists of concept inclusionsCvDand role inclusionsrvs, whereCis aC^ran,0-concept,D anEL-concept, andr, s∈NR.

C^ran-concepts are constructed using the following syntax rule C:= A | ran(r) | CuD | ∃r.C,

whereA∈NC,C, Drange overC^ran-concepts andr∈NR. The set ofEL^ran-inclusions consists of all concept inclusionsC v D and role inclusions r v s, where C is a C^ran-concept,DanEL-concept, andr, s∈NR.

Letu(the universal role) be a fresh logical symbol.C^u,u-concepts are constructed using the following syntax rule

C:= A | CuD | ∃R.C | ∃u.C,

whereA∈N_C,C, Drange overC^u,u-concepts andR=r₁u · · · ur_nwithr₁, . . . , r_n∈ N_R. The set ofEL^ran,u,u-inclusions consists of concept inclusionsC v D and role inclusionsrvs, whereCis aC^ran-concept,DaC^u,u-concept, andr, s∈N_R.

AnX-TBox is a finite set ofX-inclusions, whereX ranges overEL^ran,EL^ran,0, andEL^ran,u,u.

(6)

We have the following inclusions:

ELH^r C EL^ran,0 C EL^ran C EL^ran,u,u

whereL₁ CL₂means that everyL₁-TBox is logically equivalent to someL₂-TBox.

The semantics of the additional constructors is straightforward and given in Figure 1.

We regard the universal roleuas a logical symbol (i.e.,u6∈NR). This interpretation reflects the fact that the signature of the first-order translation of∃u.Ccoincides with the signature ofC. Observe that the TBox given as a conceptΣ-interpolant in Example 3 is anEL^ran,0-TBox; the instanceΣ-interpolant given in Example 1 is anEL^ran-TBox, and the queryΣ-interpolant in Example 2 is anEL^ran,u,u-TBox. The need for the universal role is illustrated by consideringT ={A v ∃r.B}andΣ ={r}. The empty TBox is a concept and an instanceΣ-interpolant ofT. A queryΣ-interpolant is given byT⁰={Av ∃u.B}and reflects the fact that(T,A)|=∃xB(x)forA={A(a)}.

We show that the languages introduced in Definition 3 satisfy criteria (R) and (I) from the introduction. (R) is a consequence of the following result.

Theorem 1. The following problems are PTIME-complete forEL^ran,u,u-TBoxesT and ABoxesA: decide whether

– T |=CvD, forCvDanEL^ran,u,u-inclusion;

– (T,A)|=C(a), whereCis anEL-concept.

Deciding whether(T,A)|=q(a), whereqis a conjunctive query, is NP-complete, and deciding this problem for fixedq(a)(knowledge base complexity) is PTIME-complete.

It follows, in particular, that logical equivalence ofEL^ran,u,u-TBoxes is decidable in PTime. These complexity results are exactly the same as forELH-TBoxes [15]. For (I) and the computation ofΣ-interpolants below, we first investigate the relationship between the distinct inseparability notions introduced in Definition 1 and inseparability wrt. the new languages. (ForELthis relationship is characterized in [13].) LetXrange over the superscripts ‘ran,0’, ‘ran’, and ‘ran,u, u’. Say that two finite sets of FO- sentencesT1 andT2areX-inseparable wrt.Σ, in symbolsT1 ≡^X_Σ T2, ifT1 |=α ⇔ T2|=α, for allEL^X_Σ-inclusionsα.

Theorem 2. LetT1andT2beEL^ran,u,u-TBoxes andΣan infinite signature. Then the following holds:

– T1≡^C_Σ T2iffT1≡^ran,0_Σ T2; – T1≡ⁱ_Σ T2iffT1≡^ran_Σ T2; – T1≡^q_Σ T2iffT1≡^ran,u,u_Σ T2.

The condition thatΣis infinite is required only for the implication from right to left in Point 1. As we forgetfinitesignatures, their complement is always infinite.

From Theorem 2 we immediately obtain that (I) is met for the three notions ofΣ- interpolants. For example, assume thatT1andT2areELH^r-TBoxes such thatT1 ≡^q

Σ

T2and letT₁⁰andT₂⁰ be queryΣ-interpolants inEL^ran,u,uofT1andT2, respectively.

By Theorem 2,T₁⁰ ≡^ran,u,u

Σ T₂⁰. But thenT₁⁰andT₂⁰ are logically equivalent: we have T₁⁰ |=αfor allα∈ T₂⁰ because all suchαareEL^ran,u,u

Σ -inclusions andT₂⁰ |=α. The converse direction holds for the same reason.

(7)

4 Computing Σ-interpolants

We give a recursive algorithm computing instanceΣ-interpolants forELH^r-terminologies satisfying certain acyclicity conditions (similar algorithms computing concept and query Σ-interpolants are presented in the technical report [8]). In this section we assume wlog. that terminologies arenormalizedELH^rterminologies; i.e.,ELH^r- terminologiesT consisting of role inclusions and axioms of the form (here, and in what follows, we writerv_T sifT |=rvs)

– A ./∃r.B, whereB∈NC∪ {>};

– A ./ B1u · · ·Bn, whereB1, . . . , Bn∈NC; – dom(s)vAandran(s)vA, whereA∈NC

such thatdom(s)vA∈ T andrvT simplydom(r)vA∈ T;ran(s)vA∈ T and rvT simplyran(r)vA∈ T; andrvT sandsvT rimpliesr =s. We give the acyclicity conditions required for the algorithms to terminate. TheΣ-coverC_T^Σ(r)of a rolerwrt. a terminologyT consists of alls∈Σsuch thatrvT sand there does not existr⁰∈Σwithr⁰ 6=sandrvT r⁰vT s.

Definition 4 (Σ-loop).LetT be a normalizedELH^r-terminology andΣa signature.

Define a relation≺Σ⊆(NC∩Σ)×(NC∩Σ)as follows:A≺Σ BifA, B∈Σand (a)A ./ C∈ T for someCsuch thatBoccurs inC, or

(b)A ./∃r.A⁰ ∈ T for someA⁰ ∈NC∪ {>}andr∈Σsuch thatdom(r)vB ∈ T, or

(c)A ./∃r.A⁰ ∈ T for someA⁰ ∈NC∪ {>}andrsuch that there existss ∈ C_T^Σ(r) withran(r)vB∈ T,ran(s)vB 6∈ T.

We say thatT contains aΣ-loop if≺Σcontains a cycle.

The following example illustrates this definition and shows that the existence ofΣ- loops typically entails the non-existence ofΣ-interpolants, even inFO.

Example 4. Consider the terminology

T ={ElephantvMammal,Mammalv ∃has mother.Mammal}

and letΣ ={Mammal}. Even inFO, there exists no concept/instance/queryΣ-interpolant ofT. To see this observe that aninfiniteaxiomatization of such aΣ-interpolant is given by the inclusions

{Elephantv

n

z }| {

∃has mother.· · · ∃has mother.> |n≥1}.

This theory cannot be finitely axiomatized inFOwithout additional predicates. Observe thatT contains theΣ-loop asMammal≺Σ Mammal.

Examples illustrating the points (b) and (c) of Definition 4 ofΣ-loops can be found in the technical report [8].

Call a concept nameAprimitive (pseudo-primitive) in a terminologyT ifAdoes not occur on the left hand side of any axiom inT (does not occur in the formA≡Cin T).

(8)

ForA∈sig(T), letPreΣ(A)consist of allD =ran(r),D=∃r.>, andD∈ NCsuch that T |=DvAandsig(D)⊆sig(T)∩Σ; constructPΣ(A)as follows:

1. forApseudo-primitive inT,PΣ(A) =PreΣ(A);

2. ifA≡B1u · · · uBn∈ T, then

PΣ(A) ={CB1u · · · uCBn|(Bi∈ΣandCB_i=Bi) or(Bi∈ΣandCB_i ∈PΣ(Bi))};

3. ifA≡ ∃r.A⁰∈ T, thenPΣ(A)is the union ofPreΣ(A)and – ifA⁰∈Σ:{∃s.A⁰|svT r, s∈Σ};

– ifA⁰6∈Σ:{∃s.D|svT r, s∈Σ, D∈PΣ(A⁰)}.

Fig. 2.ComputingPΣ(A)

ForA∈sig(T), letPostΣ(A) ={B∈Σ∩sig(T)| T |=AvB}and constructQΣ(A)as follows:

1. forAprimitive inT,QΣ(A) =PostΣ(A);

2. ifA ./ B1u · · · uBn∈ T, then

QΣ(A) =PostΣ(A) ∪ [

1≤i≤n,B_i∈Σ

QΣ(Bi);

3. ifA ./∃r.A⁰∈ T, thenQΣ(A)is the union ofPostΣ(A)and {∃s.Es|s∈ C^ΣT(r)} ∪ [

rv_Ts

{QΣ(B)|dom(s)vB∈ T, s∈Σ, B∈Σ},

where

Es= l

B∈Σ,ran(r)vB∈T ran(s)vB6∈T, D∈Q_Σ(B)

D u l

D∈Q_Σ(A⁰) A⁰∈Σ

D u l

B∈Σ T |=ran(r)uA⁰vB

B.

Fig. 3.ComputingQΣ(A)

The intuition behind the following algorithm forΣ-interpolants is as follows: first, one can show using Theorem 2 and a sequent-style proof system forELH^rthat under the conditions of Theorem 3 there exists an instance Σ-interpolant consisting of (in addition to role inclusions and domain and range restrictions) concept inclusions of the formCvAandAvC. In Figure 2, we compute the setPΣ(A)ofCs such thatCvA is in the interpolant by making a case distinction: in Point 1Ais pseudo-primitive; in Point 2 it is defined by a conjunction; in Point 3 it is defined as∃r.A⁰. Points 2 and 3 are recursive as they require the setsPΣ(B)whenBis used in the definition ofA.Σ-loops describe exactly the situation in which the recursion does not terminate. In Figure 3 we compute, in a similar way, the setQΣ(A)ofCs such thatAvCis in the interpolant.

A sample run of the algorithm is provided in [8].

Theorem 3. LetΣbe a finite signature andT a normalizedELH^r-terminology with- outΣ-loops. Then the algorithms computingPΣ(A)andQΣ(A)in Figures 2 and 3 terminate for allA∈sig(T).

LetTΣconsist of the following inclusions, whereA,r, andsrange oversig(T)∩Σ:

– rvs, forrv_T s;

(9)

– DvA, for allD∈P_Σ(A);

– AvD, for allD∈QΣ(A);

– ran(r)vD, for allD∈Q_Σ(B)such thatran(r)vB∈ T andB∈Σ;

– dom(r)vD, for allD∈QΣ(B)such thatdom(r)vB ∈ T andB∈Σ.

ThenTΣis an instanceΣ-interpolant ofT.

P_Σ(A)andQ_Σ(A)are both of exponential size, in the worst case. ForP_Σ(A), this is clear from Point 2 of the construction: letT consist ofA≡B1u · · · uBnandA^j_i vBi

(1 ≤ i, j ≤n) and letΣ = {Bi | 1 ≤i ≤ n}. ThenPΣ(A)is of sizenⁿ, and one can show that there does not exist a shorterΣ-interpolant inEL^ran,u,u. For QΣ(A) this follows from the fact that one might have to construct a complete unfolding of the terminology.

If we admit disjunctions inCin axiomsC v D of Σ-interpolants, then we can replace, in Point 2,P_Σ(A)forA≡B₁u · · · uB_n ∈ T by the singleton set consisting

of l

1≤i≤n,Bi∈Σ

Bi u l

1≤i≤n,Bi∈Σ

G

C_Bi∈PΣ(Bi)

CB_i.

We will see below that in practice this construction leads to much smallerΣ-interpolants.

However, this improvement does not come for free. Consider the languageEL^ran,t, where the only difference to EL^ran is thatC^ran-concepts now admit ‘t’ as a binary concept constructor. EveryEL^ran,t-TBox is logically equivalent to an (exponentially larger) EL^ran-TBox, and so EL^ran,t inherits many desirable properties from EL^ran. However, one can show that, in contrast toEL^ran, logical equivalence betweenEL^ran,t- TBoxes is coNP-hard.

5 Experiments

We have implemented a prototype calledNUIthat computes instanceΣ-interpolants as presented in Theorem 3. We have appliedNUIto a version of SNOMED CT dated 09 February 2005 (without two left-identities) and theELH^r-fragment of the release 08.08d of NCI. The first terminology has approx. 380K axioms, almost the same number of concept names, and56role names. TheELH^r-fragment of NCI has approx.63K axioms, approx.65K concept names, and123role names. We note that the algorithms given above compute (for ease of exposition) a large number of redundant axioms and NUIimplements a variety of straightforward optimizations.

|Σ∩sig(·)| SNOMED CT |Σ∩sig(·)| NCI

2 000 93.0% 5 000 97.0%

3 000 84.5% 10 000 81.1%

4 000 67.0% 15 000 72.0%

5 000 59.5% 20 000 59.2%

Table 1.Success rate ofNUI

(10)

Fig. 4.Size distribution of MEX-modules and instanceΣ-interpolants of SNOMED CT First observe that neither SNOMED CT nor NCI contain anyΣ-loops, for any sig- natureΣ. Thus,Σ-interpolants always exist and can, in principle, be computed using our algorithm.

In our experiments, we focus on the case of forgetting a large signatureΣ (and keeping a “small” signatureΣ∩sig(·)), as this corresponds to many application scenar- ios. The experiments have been performed on a standard PC with2.13GHz and3GB of RAM.

Success rate: Table 1 shows the rate at whichNUIsucceeds to compute instanceΣ- interpolants of SNOMED CT and NCI wrt. various signatures. All failed cases are due to memory overflow after several hours. For each table entry,100samples have been used. The signatures contain concept and role names randomly selected from the full signature of SNOMED CT (we never forget the role ‘roleGroup’ as this would make forgetting trivial) and NCI, respectively.Σ∩sig(·)always contains20role names. For NCI and signatures of size≤4 000,NUIhad a100%success rate.

Size: We compare the size of instanceΣ-interpolants of SNOMED CT and NCI computed byNUIwith the size of extractedΣ∩sig(·)-modules; i.e., minimalsubsetsof the respective terminologies which preserve, e.g., inclusions betweenΣ∩sig(·)-concepts.

We use MEX-modules [5] of SNOMED CT and, respectively,>-local modules [2] of NCI. The size ofΣ-interpolants, terminologies, and modules is measured as number of symbols rather than number of axioms asΣ-interpolants can contain large axioms.

For SNOMED CT, we generated100random signatures with3 000concept names and20role names inΣ∩sig(·). For17of those signatures,NUIfailed to compute an instanceΣ-interpolant. For the remaining83signatures, Figure 4 shows the number of interpolants andΣ∩sig(·)-modules (vertical axis) of a given size (horizontal axis). The size of theΣ∩sig(·)-modules lies between125K and150K symbols, which is about3%

of SNOMED CT.48.19%of instanceΣ-interpolants are smaller than the corresponding modules. However,10instanceΣ-interpolants contain more than one million symbols and the largest instanceΣ-interpolant is more than11times larger than SNOMED CT.

For NCI, we computed instanceΣ-interpolants andΣ∩sig(·)-modules wrt. random signatures with7 000concept names and20role names inΣ∩sig(·).NUIsucceeded

(11)

Fig. 5.Size distribution of CEL-modules and instanceΣ-interpolants of NCI

to compute interpolants for97of100signatures, each within25min. Figure 5 shows the size distribution of the successfully computed interpolants and the corresponding modules. The size of theΣ∩sig(·)-modules lies between140K and160K symbols, which is about22.5%of NCI.74.47%of the instanceΣ-interpolants are smaller than the corresponding modules. On the other hand,18instanceΣ-interpolants consist of more than400K symbols and the largest instanceΣ-interpolant is more than12times larger than NCI.

Forgetting with disjunction: All failures in Table 1 are due to the fact that PΣ(A)is too large. Indeed, if we admit disjunction and considerEL^ran,t, thenNUIsucceeds to computeallΣ-interpolants from Table 1, each within15min. Moreover, for NCI, no signature for whichNUIfails has been detected. For SNOMED CT, however,NUIstill typically fails for|Σ∩sig(·)| ≥30 000.

6 Discussion

The notion of forgetting in DL ontologies has recently been investigated in a number of research papers. [9, 19] consider forgetting in DL-Lite and [3] investigate in how far forgetting in DLs can be reduced to forgetting in logic programs. [6], on which this paper is based, proposes forgetting for acyclicEL-terminologies and concept inclusions.

The main novel contributions of this paper are (i) the first algorithms with exper- imental results indicating the practical feasibility of forgetting in DL-terminologies and (ii) the first systematic analysis of the distinct languages required to axiomatize Σ-interpolants for distinct query languages. Many open problems remain; e.g., we con- juncture thatΣ-interpolants ofELH^r-terminologies (and possibly even TBoxes) exist in the languages introduced whenever they exist in FO. Such a result would provide further justification for those languages. Secondly, it would be of interest to prove de- cidability (and complexity) of the decision problem whether there exists aΣ-interpolant for a givenELH^r-terminology (TBox). Note that our acyclicity conditions are sufficient but not necessary for the existence ofΣ-interpolants.

(12)

References

1. F. Baader, S. Brandt, and C. Lutz. Pushing the EL-envelope further. InProceedings of OWLED DC, 2008.

2. B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Just the right amount: Extracting modules from ontologies. InProceedings of WWW’07, pages 717–727, 2007.

3. T. Eiter, G. Ianni, R. Schindlauer, H. Tompits, and K. Wang. Forgetting in managing rules and ontologies. InWeb Intelligence, pages 411–419, 2006.

4. T. Eiter and K. Wang. Semantic forgetting in answer set programming.Artificial Intelligence, 172(14):1644–1672, 2008.

5. B. Konev, C. Lutz, D. Walther, and F. Wolter. Semantic modularity and module extraction in description logic. InProceedings of ECAI’08, pages 55–59, 2008.

6. B. Konev, D. Walther, and F. Wolter. The logical difference problem for description logic terminologies. InProceedings of IJCAR’08, pages 259–274, 2008.

7. B. Konev, D. Walther, and F. Wolter. Forgetting and uniform interpolation in large-scale description logic terminologies. InProceedings of IJCAI’09, 2009. To appear.

8. B. Konev, D. Walther, and F. Wolter. Forgetting and uniform interpolation in large-scale description logic terminologies. Technical Report ULCS-09-006, University of Liverpool, 2009.

9. R. Kontchakov, F. Wolter, and M. Zakharyaschev. Can you tell the difference between DL- lite ontologies. InProceedings of KR’08, pages 285–295, 2008.

10. J. Lang, P. Liberatore, and P. Marquis. Propositional independence: formula-variable independence and forgetting.Journal of Artificial Intelligence Research, 18:391–443, 2003.

11. C. Lutz, D. Walther, and F. Wolter. Conservative extensions in expressive description logics.

InProceedings of IJCAI’07, pages 453–458, 2007.

12. C. Lutz and F. Wolter. Conservative extensions in the lightweight description logic EL. In Proceedings of CADE’07, pages 84–99, 2007.

13. C. Lutz and F. Wolter. Deciding inseparability and conservative extensions in the description logic EL.Journal of Symbolic Computation, 2009. To appear.

14. R. Reiter and F. Lin. Forget it! InProceedings of AAAI Fall Symposium on Relevance, pages 154–159, 1994.

15. R. Rosati. On conjunctive query answering in EL. InProceedings of DL’07, 2007.

16. N. Sioutos, S. de Coronado, M. Haber, F. Hartel, W. Shaiu, and L. Wright. NCI thesaurus:

a semantic model integrating cancer-related clinical and molecular information. Journal of Biomedical Informatics, 40(1):30–43, 2006.

17. K. Spackman. Managing clinical terminology hierarchies using algorithmic calculation of subsumption: Experience with SNOMED-RT. 2000.

18. A. Visser. Uniform interpolation and layered bisimulation. In G¨odel ’96 (Brno, 1996), volume 6 ofLecture Notes Logic. Springer Verlag, 1996.

19. Z. Wang, K. Wang, R. Topor, and J. Z. Pan. Forgetting in DL-Lite. In Proceedings of ESWC’08, 2008.