Forgetting and uniform interpolation in extensions of the description logic EL
Boris Konev, Dirk Walther, and Frank Wolter? University of Liverpool, UK
{konev, dwalther, wolter}@liverpool.ac.uk
1 Introduction
Terminologies used in life sciences, health-care, and other knowledge intensive ar- eas are often very large and comprehensive. Examples are the medical terminology SNOMED CT (Systematised Nomenclature of Medicine Clinical Terms) [17] con- taining about380 000 concept definitions and the National Cancer Institute ontology (NCI) [16] containing more than60 000axioms. For terminologiesT of this size, it is often of interest toforgeta subvocabularyΣof the vocabulary ofT; i.e., to transformT into a new terminologyTΣ(called aΣ-interpolant ofT) that contains no symbols from Σand that is indistinguishable fromT regarding its consequences that do not useΣ. In AI, this problem has been studied under a variety of names such asforgettingandvari- able elimination[14, 4, 10]. In mathematical logic, this problem has been investigated as theuniform interpolation problem[18]. ComputingΣ-interpolants of terminologies has a number of potential applications, e.g.,
Re-use of ontologies:when using terminologies such as SNOMED CT in an application, often only a very small fraction of its vocabulary is of interest. In this case, one could use aΣ-interpolant instead of the whole terminology, whereΣis the vocabulary not of interest for the application.
Predicate hiding: a terminology developer might not want to publish a terminology completely because a certain part of its vocabulary is not intended for public use. Again, publishing Σ-interpolants, where Σ is the vocabulary to be hidden, appears to be a solution to this problem.
Exhibiting hidden relations between terms:large terminologies are difficult to maintain as small changes to its axioms can have drastic and damaging effects. To analyze pos- sibly unwanted consequences over a certain partΓ of the vocabulary, an ontology de- veloper can automatically generate a complete axiomatization of the relations between terms overΓ by computing aΣ-interpolant, whereΣis the complement ofΓ. Ontology versioning: to check whether two versions of a terminology have the same consequences over their common vocabulary (or a subset thereof), one can first compute their interpolants by forgetting the vocabulary not shared by the two versions and then check whether the two interpolants are logically equivalent (i.e., have the same models).
In the description ofΣ-interpolants given above, we have neither specified a lan- guage in which they are axiomatized nor did we specify the language wrt. whichΣ- interpolants should be indistinguishable from the original termininology. The choice of
?This research was supported by the EPSRC grant EP/E065279/1.
the latter language depends on the application: for example, if one is interested ininclu- sions between concepts, then aΣ-interpolant should imply the same concept inclusions using no symbols fromΣas the original terminology. On the other hand, if the termi- nology is used to query instance data usingconjunctive queries, then aΣ-interpolant together with any instance data using no symbols fromΣshould imply the same certain answers to conjunctive queries using no symbols fromΣas the original terminology.
Regarding the languageLin whichΣ-interpolants should be axiomatized, one has to find a compromise between the following three conflicting goals:
(R) Standard reasoning problems (e.g., logical equivalence) inLshould not be more complex than reasoning in the language underlying the terminology.
(I)Σ-interpolants inLshould be uniquely determined up to logical equivalence: ifT10 andT20 areΣ-interpolants inLof terminologiesT1andT2that have the same conse- quences not usingΣ, thenT10andT20should be logically equivalent.
(E) The language Lshould be powerful enough to admitfiniteandsuccinct(ideally, polynomial size) axiomatizations ofΣ-interpolants, and it should be possible to com- puteΣ-interpolants efficiently (ideally, in polynomial time).
For terminologies in standard description logics (DLs) such asELand any language betweenALC andSHIQO, there do not exist languagesLachieving all these goals simultaneously.1 To illustrate this point, letL be second-order logic. ThenLtrivially satisfies (E) but fails to satisfy (R) and (I), for ontologies in any standard DL.
In this paper, we consider forgetting in the lightweight description logicELunder- lying the designated OWL2-EL profile of the upcoming OWL Version 2 extended with role inclusions and domain and range restrictions [1]. This choice it motivated by the fact that forgetting appears to be of particular interest for large-scale and comprehen- sive terminologies and that many such terminologies are given in this language. We introduce three DLs for axiomatizingΣ-interpolants satisfying criteria (R) and (I) and preserving, respectively, inclusions between concepts, concept instances, and answers to conjunctive queries. These DLs do not satisfy (E), asΣ-interpolants sometimes do not exist or are of exponential size. We demonstrate that, nevertheless,Σ-interpolants typically exist and can be computed in practice for large-scale terminologies such as SNOMED CT and appropriate versions of NCI. This work has been presented in [7];
detailed proofs and additional experiments are available in a technical report [8].
2 Preliminaries
LetNC,NR, andNIbe countably infinite and mutually disjoint sets of concept names, role names, and individual names.EL-conceptsCare built according to the rule
C:= A | > | CuD | ∃r.C,
whereA∈NC,r∈NR, andC, Drange overEL-concepts. The set ofELHr-inclusions consists ofconcept inclusionsCvDandconcept equationsC≡D,domain restric- tionsdom(r) v C,range restrictionsran(r) v Candrole inclusionsr v s, where
1This follows from the fact that deciding whether TBoxes in these DLs imply the same concept inclusions over a signature is at least one exponential harder than deciding logical equiva- lence [12, 11].
ConceptC TranslationC] ConceptC TranslationC]
> x=x dom(r) ∃y(r(x, y))
A A(x) ran(r) ∃y(r(y, x))
CuD C](x)∧D](x) ∃u.C (x=x)∧ ∃y C](y)
∃r.C ∃y(r(x, y)∧C](y)) ∃r1u · · · urn.C ∃y(r1(x, y)∧ · · · ∧rn(x, y)∧C](y))
Fig. 1.Standard translation·]
C, D are EL-concepts andr, s ∈ NR. AnELHr-TBox T is a finite set ofELHr- inclusions. AnELHr-TBox is calledELHr-terminologyif all its concept inclusions and equations are of the formA vC andA ≡Cand no concept name occurs more than once on the left hand side. In what follows, we useA ./ C to denote expressions of the formAvCandA≡C.
Assertions of the formA(a)andr(a, b), wherea, b ∈ NI,A ∈ NC, andr ∈ NR, are called ABox-assertions. An ABox is a finite set of ABox-assertions. By obj(A) we denote the set of individual names inA. A knowledge base (KB) is a pair(T,A) consisting of a TBoxT and an ABoxA. Assertions of the formC(a)andr(a, b), where a, b ∈ NI,CaEL-concept, andr ∈ NR, are called instance assertions. To define the semantics of DLs considered in this paper we make use of the fact that DL-expressions can be regarded as formulas inFO, whereFOdenotes the set of first-order predicate logic formulas with equality using unary predicates inNC, binary predicates inNR, and constants fromNI; see Figure 1 (in which the DL-constructors not considered so far are defined later). In what follows, we will not distinguish between DL-expressions and their translation intoFOand regard TBoxes, ABoxes and KBs as finite subsets ofFO.
Thus, we useT |=ϕto denote thatϕfollows fromT in first-order logic even ifϕis an ELHr-inclusion andT a subset ofFOand similar conventions apply to DLs introduced later in this paper.FO(and, therefore,ELHr) is interpreted in modelsI = (∆I,·I), where thedomain∆I is a non-empty set, and·I is a function mapping each concept nameAto a subsetAI of∆I, each role namerto a binary relationrI ⊆∆I×∆I, and each individual nameato an elementaI ∈∆I.
The most important ways of queryingELHr-TBoxes and KBs are subsumption (check whetherT |= αfor anELHr-inclusionα), instance checking (check whether (T,A) |= αfor an instance assertionα), and conjunctive query answering. To de- fine the latter, call a first-order formula q(x)aconjunctive query if it is of the form
∃yψ(x,y), whereψis a conjunction of expressionsA(t)andr(t1, t2)witht, t1, t2
drawn fromNIand sequences of variablesxandy. Ifxhas lengthk, then a sequence aof elements ofobj(A)of lengthkis called acertain answertoq(x)of a KB(T,A) if(T,A)|=q(a).
3 Forgetting
A signature Σ is a subset ofNC ∪NR2. Given a signature Σ, we set Σ = (NC ∪ NR)\Σ. Given a concept, role, concept inclusion, TBox, ABox,FO-sentence, set of
2We investigate forgetting for TBoxes for DLs without nominals; thus we do not include indi- vidual names into the signature.
FO-sentencesE, we denote bysig(E)the signature ofE, that is, the set of concept and role names occurring in it. We use the termELHrΣ-inclusion (Σ-ABox,Σ-query,LΣ- sentence, etc.) to denoteELHr-inclusions (ABoxes, queries,L-sentences, etc.) whose signature is contained inΣ.
To define forgetting, we first formalize the notion of inseparability between TBoxes wrt. a signature. Intuitively, two TBoxesT1andT2are inseparable wrt. a signatureΣif they have the sameΣ-consequences, where the set ofΣ-consequences considered can either reflect subsumption queries, instance queries, or conjunctive queries, depending on the application. We give the definitions for sets ofFO-sentences because we later require these notions for a variety of DLs.
Definition 1. LetT1andT2be sets ofFO-sentences andΣa signature.
–T1andT2are conceptΣ-inseparable, in symbolsT1≡CΣT2, if for allELHrΣ-inclusions α:T1|=α⇔ T2|=α.
–T1 andT2 are instanceΣ-inseparable, in symbolsT1 ≡iΣ T2, if for allΣ-ABoxes AandΣ-instance assertionsαusing individual names fromobj(A):(T1,A)|=α⇔ (T2,A)|=α.
–T1 andT2 are queryΣ-inseparable, in symbolsT1 ≡qΣ T2, if for all Σ-ABoxesA, conjunctiveΣ-queriesq(x), and vectorsaof the same length asxof individual names inobj(A):(T1,A)|=q(a)⇔(T2,A)|=q(a).
The definition of forgetting (Σ-interpolants) is now straightforward.
Definition 2 (Σ-interpolant).LetT be anELHr-TBox,Σa finite signature, andL a set ofFO-sentences. IfTΣ is a finite set of LΣ-sentences such thatT |= ϕfor all ϕ∈ TΣ, thenTΣis
– a conceptΣ-interpolant ofT inLifT ≡C
Σ TΣ; – an instanceΣ-interpolant ofT inLifT ≡iΣ TΣ; – a queryΣ-interpolant ofT inLifT ≡qΣTΣ.
One can show that every query Σ-interpolant is an instanceΣ-interpolant and every instance Σ-interpolant is a concept Σ-interpolant. The converse implications do not hold, even forELHr-terminologies:
Example 1. LetT ={ran(r)vA1,ran(s)vA2, B≡A1uA2}andΣ={A1, A2}.
One can show that the empty TBox is a concept Σ-interpolant ofT. However, the empty TBox is not an instanceΣ-interpolant ofT. To show this, consider theΣ-ABox A = {r(a0, b), s(a1, b)}. Then(T,A) |= B(b)but(∅,A) 6|= B(b). Observe thatno ELHr-TBox (and even noSHQ-TBox) is an instanceΣ-interpolant ofT because it is impossible to capture the ABoxAin a DL in which one cannot refer to the range of distinct roles in one concept. On the other hand, the TBoxT0={ran(r)uran(s)vB}
given in an extension ofELHris an instanceΣ-interpolant ofT.
Example 2. LetT ={A v ∃s.>, sv r1, svr2}andΣ ={s}. ThenT0 ={Av
∃r1.> u ∃r2.>}is an instanceΣ-interpolant ofT, butT0is not a queryΣ-interpolant ofT. To show the latter, letA ={A(a)}and letq = ∃x(r1(a, x)∧r2(a, x)). Then (T,A)|=qbut(T0,A)6|=q. Again, noELHr-TBox (and even no TBox inSHIQ) is
a queryΣ-interpolant ofT. On the other hand, the TBoxT00={Av ∃r1ur2.>}given in an extension ofELHrwith conjunctions of roles names is a queryΣ-interpolant of T.
Besides of exhibiting examples where concept, instance, and queryΣ-interpolants are distinct, Example 1 and 2 also show that even in extremely simple casesELHrand a va- riety of more expressive DLs are not sufficiently powerful to express instance and query Σ-interpolants ofELHr-terminologies. Rather surprisingly, there also exist simple ex- amples in whichELHr-TBoxes are not sufficiently expressive to axiomatize concept Σ-interpolants ofELHr-terminologies.
Example 3. LetΣ={Research Inst,Education Inst}andT be University≡Research InstuEducation Inst
SchoolvEducation Inst ran(PhD from)vResearch Inst
Then there does not exist anELHr-TBox that is a conceptΣ-interpolant ofT. Intu- itively, the reason is that there is noELHrΣ-TBox which follows fromT and has the following infinite set ofΣ-consequences (which are consequences ofT):
∃PhD from.(SchooluA)v ∃PhD from.(UniversityuA),
where A ∈ Σ. On the other hand, the TBox T0 = {ran(PhD from)uSchool v University}given in an extension ofELHris a conceptΣ-interpolant ofT.
We now introduce three extensions ofELHrwhich we propose to axiomatize concept, instance, and queryΣ-interpolants.
Definition 3 (ELran,0,ELran,ELran,u,u).Cran,0-concepts are constructed using the fol- lowing syntax rule
C:= D | ran(r) | ran(r)uD,
whereDranges overEL-concepts andr ∈NR. The set ofELran,0-inclusions consists of concept inclusionsCvDand role inclusionsrvs, whereCis aCran,0-concept,D anEL-concept, andr, s∈NR.
Cran-concepts are constructed using the following syntax rule C:= A | ran(r) | CuD | ∃r.C,
whereA∈NC,C, Drange overCran-concepts andr∈NR. The set ofELran-inclusions consists of all concept inclusionsC v D and role inclusions r v s, where C is a Cran-concept,DanEL-concept, andr, s∈NR.
Letu(the universal role) be a fresh logical symbol.Cu,u-concepts are constructed using the following syntax rule
C:= A | CuD | ∃R.C | ∃u.C,
whereA∈NC,C, Drange overCu,u-concepts andR=r1u · · · urnwithr1, . . . , rn∈ NR. The set ofELran,u,u-inclusions consists of concept inclusionsC v D and role inclusionsrvs, whereCis aCran-concept,DaCu,u-concept, andr, s∈NR.
AnX-TBox is a finite set ofX-inclusions, whereX ranges overELran,ELran,0, andELran,u,u.
We have the following inclusions:
ELHr C ELran,0 C ELran C ELran,u,u
whereL1 CL2means that everyL1-TBox is logically equivalent to someL2-TBox.
The semantics of the additional constructors is straightforward and given in Figure 1.
We regard the universal roleuas a logical symbol (i.e.,u6∈NR). This interpretation re- flects the fact that the signature of the first-order translation of∃u.Ccoincides with the signature ofC. Observe that the TBox given as a conceptΣ-interpolant in Example 3 is anELran,0-TBox; the instanceΣ-interpolant given in Example 1 is anELran-TBox, and the queryΣ-interpolant in Example 2 is anELran,u,u-TBox. The need for the uni- versal role is illustrated by consideringT ={A v ∃r.B}andΣ ={r}. The empty TBox is a concept and an instanceΣ-interpolant ofT. A queryΣ-interpolant is given byT0={Av ∃u.B}and reflects the fact that(T,A)|=∃xB(x)forA={A(a)}.
We show that the languages introduced in Definition 3 satisfy criteria (R) and (I) from the introduction. (R) is a consequence of the following result.
Theorem 1. The following problems are PTIME-complete forELran,u,u-TBoxesT and ABoxesA: decide whether
– T |=CvD, forCvDanELran,u,u-inclusion;
– (T,A)|=C(a), whereCis anEL-concept.
Deciding whether(T,A)|=q(a), whereqis a conjunctive query, is NP-complete, and deciding this problem for fixedq(a)(knowledge base complexity) is PTIME-complete.
It follows, in particular, that logical equivalence ofELran,u,u-TBoxes is decidable in PTime. These complexity results are exactly the same as forELH-TBoxes [15]. For (I) and the computation ofΣ-interpolants below, we first investigate the relationship be- tween the distinct inseparability notions introduced in Definition 1 and inseparability wrt. the new languages. (ForELthis relationship is characterized in [13].) LetXrange over the superscripts ‘ran,0’, ‘ran’, and ‘ran,u, u’. Say that two finite sets of FO- sentencesT1 andT2areX-inseparable wrt.Σ, in symbolsT1 ≡XΣ T2, ifT1 |=α ⇔ T2|=α, for allELXΣ-inclusionsα.
Theorem 2. LetT1andT2beELran,u,u-TBoxes andΣan infinite signature. Then the following holds:
– T1≡CΣ T2iffT1≡ran,0Σ T2; – T1≡iΣ T2iffT1≡ranΣ T2; – T1≡qΣ T2iffT1≡ran,u,uΣ T2.
The condition thatΣis infinite is required only for the implication from right to left in Point 1. As we forgetfinitesignatures, their complement is always infinite.
From Theorem 2 we immediately obtain that (I) is met for the three notions ofΣ- interpolants. For example, assume thatT1andT2areELHr-TBoxes such thatT1 ≡q
Σ
T2and letT10andT20 be queryΣ-interpolants inELran,u,uofT1andT2, respectively.
By Theorem 2,T10 ≡ran,u,u
Σ T20. But thenT10andT20 are logically equivalent: we have T10 |=αfor allα∈ T20 because all suchαareELran,u,u
Σ -inclusions andT20 |=α. The converse direction holds for the same reason.
4 Computing Σ-interpolants
We give a recursive algorithm computing instanceΣ-interpolants forELHr-termino- logies satisfying certain acyclicity conditions (similar algorithms computing concept and query Σ-interpolants are presented in the technical report [8]). In this section we assume wlog. that terminologies arenormalizedELHrterminologies; i.e.,ELHr- terminologiesT consisting of role inclusions and axioms of the form (here, and in what follows, we writervT sifT |=rvs)
– A ./∃r.B, whereB∈NC∪ {>};
– A ./ B1u · · ·Bn, whereB1, . . . , Bn∈NC; – dom(s)vAandran(s)vA, whereA∈NC
such thatdom(s)vA∈ T andrvT simplydom(r)vA∈ T;ran(s)vA∈ T and rvT simplyran(r)vA∈ T; andrvT sandsvT rimpliesr =s. We give the acyclicity conditions required for the algorithms to terminate. TheΣ-coverCTΣ(r)of a rolerwrt. a terminologyT consists of alls∈Σsuch thatrvT sand there does not existr0∈Σwithr0 6=sandrvT r0vT s.
Definition 4 (Σ-loop).LetT be a normalizedELHr-terminology andΣa signature.
Define a relation≺Σ⊆(NC∩Σ)×(NC∩Σ)as follows:A≺Σ BifA, B∈Σand (a)A ./ C∈ T for someCsuch thatBoccurs inC, or
(b)A ./∃r.A0 ∈ T for someA0 ∈NC∪ {>}andr∈Σsuch thatdom(r)vB ∈ T, or
(c)A ./∃r.A0 ∈ T for someA0 ∈NC∪ {>}andrsuch that there existss ∈ CTΣ(r) withran(r)vB∈ T,ran(s)vB 6∈ T.
We say thatT contains aΣ-loop if≺Σcontains a cycle.
The following example illustrates this definition and shows that the existence ofΣ- loops typically entails the non-existence ofΣ-interpolants, even inFO.
Example 4. Consider the terminology
T ={ElephantvMammal,Mammalv ∃has mother.Mammal}
and letΣ ={Mammal}. Even inFO, there exists no concept/instance/queryΣ-inter- polant ofT. To see this observe that aninfiniteaxiomatization of such aΣ-interpolant is given by the inclusions
{Elephantv
n
z }| {
∃has mother.· · · ∃has mother.> |n≥1}.
This theory cannot be finitely axiomatized inFOwithout additional predicates. Observe thatT contains theΣ-loop asMammal≺Σ Mammal.
Examples illustrating the points (b) and (c) of Definition 4 ofΣ-loops can be found in the technical report [8].
Call a concept nameAprimitive (pseudo-primitive) in a terminologyT ifAdoes not occur on the left hand side of any axiom inT (does not occur in the formA≡Cin T).
ForA∈sig(T), letPreΣ(A)consist of allD =ran(r),D=∃r.>, andD∈ NCsuch that T |=DvAandsig(D)⊆sig(T)∩Σ; constructPΣ(A)as follows:
1. forApseudo-primitive inT,PΣ(A) =PreΣ(A);
2. ifA≡B1u · · · uBn∈ T, then
PΣ(A) ={CB1u · · · uCBn|(Bi∈ΣandCBi=Bi) or(Bi∈ΣandCBi ∈PΣ(Bi))};
3. ifA≡ ∃r.A0∈ T, thenPΣ(A)is the union ofPreΣ(A)and – ifA0∈Σ:{∃s.A0|svT r, s∈Σ};
– ifA06∈Σ:{∃s.D|svT r, s∈Σ, D∈PΣ(A0)}.
Fig. 2.ComputingPΣ(A)
ForA∈sig(T), letPostΣ(A) ={B∈Σ∩sig(T)| T |=AvB}and constructQΣ(A)as follows:
1. forAprimitive inT,QΣ(A) =PostΣ(A);
2. ifA ./ B1u · · · uBn∈ T, then
QΣ(A) =PostΣ(A) ∪ [
1≤i≤n,Bi∈Σ
QΣ(Bi);
3. ifA ./∃r.A0∈ T, thenQΣ(A)is the union ofPostΣ(A)and {∃s.Es|s∈ CΣT(r)} ∪ [
rvTs
{QΣ(B)|dom(s)vB∈ T, s∈Σ, B∈Σ},
where
Es= l
B∈Σ,ran(r)vB∈T ran(s)vB6∈T, D∈QΣ(B)
D u l
D∈QΣ(A0) A0∈Σ
D u l
B∈Σ T |=ran(r)uA0vB
B.
Fig. 3.ComputingQΣ(A)
The intuition behind the following algorithm forΣ-interpolants is as follows: first, one can show using Theorem 2 and a sequent-style proof system forELHrthat under the conditions of Theorem 3 there exists an instance Σ-interpolant consisting of (in addition to role inclusions and domain and range restrictions) concept inclusions of the formCvAandAvC. In Figure 2, we compute the setPΣ(A)ofCs such thatCvA is in the interpolant by making a case distinction: in Point 1Ais pseudo-primitive; in Point 2 it is defined by a conjunction; in Point 3 it is defined as∃r.A0. Points 2 and 3 are recursive as they require the setsPΣ(B)whenBis used in the definition ofA.Σ-loops describe exactly the situation in which the recursion does not terminate. In Figure 3 we compute, in a similar way, the setQΣ(A)ofCs such thatAvCis in the interpolant.
A sample run of the algorithm is provided in [8].
Theorem 3. LetΣbe a finite signature andT a normalizedELHr-terminology with- outΣ-loops. Then the algorithms computingPΣ(A)andQΣ(A)in Figures 2 and 3 terminate for allA∈sig(T).
LetTΣconsist of the following inclusions, whereA,r, andsrange oversig(T)∩Σ:
– rvs, forrvT s;
– DvA, for allD∈PΣ(A);
– AvD, for allD∈QΣ(A);
– ran(r)vD, for allD∈QΣ(B)such thatran(r)vB∈ T andB∈Σ;
– dom(r)vD, for allD∈QΣ(B)such thatdom(r)vB ∈ T andB∈Σ.
ThenTΣis an instanceΣ-interpolant ofT.
PΣ(A)andQΣ(A)are both of exponential size, in the worst case. ForPΣ(A), this is clear from Point 2 of the construction: letT consist ofA≡B1u · · · uBnandAji vBi
(1 ≤ i, j ≤n) and letΣ = {Bi | 1 ≤i ≤ n}. ThenPΣ(A)is of sizenn, and one can show that there does not exist a shorterΣ-interpolant inELran,u,u. For QΣ(A) this follows from the fact that one might have to construct a complete unfolding of the terminology.
If we admit disjunctions inCin axiomsC v D of Σ-interpolants, then we can replace, in Point 2,PΣ(A)forA≡B1u · · · uBn ∈ T by the singleton set consisting
of l
1≤i≤n,Bi∈Σ
Bi u l
1≤i≤n,Bi∈Σ
G
CBi∈PΣ(Bi)
CBi.
We will see below that in practice this construction leads to much smallerΣ-interpolants.
However, this improvement does not come for free. Consider the languageELran,t, where the only difference to ELran is thatCran-concepts now admit ‘t’ as a binary concept constructor. EveryELran,t-TBox is logically equivalent to an (exponentially larger) ELran-TBox, and so ELran,t inherits many desirable properties from ELran. However, one can show that, in contrast toELran, logical equivalence betweenELran,t- TBoxes is coNP-hard.
5 Experiments
We have implemented a prototype calledNUIthat computes instanceΣ-interpolants as presented in Theorem 3. We have appliedNUIto a version of SNOMED CT dated 09 February 2005 (without two left-identities) and theELHr-fragment of the release 08.08d of NCI. The first terminology has approx. 380K axioms, almost the same num- ber of concept names, and56role names. TheELHr-fragment of NCI has approx.63K axioms, approx.65K concept names, and123role names. We note that the algorithms given above compute (for ease of exposition) a large number of redundant axioms and NUIimplements a variety of straightforward optimizations.
|Σ∩sig(·)| SNOMED CT |Σ∩sig(·)| NCI
2 000 93.0% 5 000 97.0%
3 000 84.5% 10 000 81.1%
4 000 67.0% 15 000 72.0%
5 000 59.5% 20 000 59.2%
Table 1.Success rate ofNUI
Fig. 4.Size distribution of MEX-modules and instanceΣ-interpolants of SNOMED CT First observe that neither SNOMED CT nor NCI contain anyΣ-loops, for any sig- natureΣ. Thus,Σ-interpolants always exist and can, in principle, be computed using our algorithm.
In our experiments, we focus on the case of forgetting a large signatureΣ (and keeping a “small” signatureΣ∩sig(·)), as this corresponds to many application scenar- ios. The experiments have been performed on a standard PC with2.13GHz and3GB of RAM.
Success rate: Table 1 shows the rate at whichNUIsucceeds to compute instanceΣ- interpolants of SNOMED CT and NCI wrt. various signatures. All failed cases are due to memory overflow after several hours. For each table entry,100samples have been used. The signatures contain concept and role names randomly selected from the full signature of SNOMED CT (we never forget the role ‘roleGroup’ as this would make forgetting trivial) and NCI, respectively.Σ∩sig(·)always contains20role names. For NCI and signatures of size≤4 000,NUIhad a100%success rate.
Size: We compare the size of instanceΣ-interpolants of SNOMED CT and NCI com- puted byNUIwith the size of extractedΣ∩sig(·)-modules; i.e., minimalsubsetsof the respective terminologies which preserve, e.g., inclusions betweenΣ∩sig(·)-concepts.
We use MEX-modules [5] of SNOMED CT and, respectively,>-local modules [2] of NCI. The size ofΣ-interpolants, terminologies, and modules is measured as number of symbols rather than number of axioms asΣ-interpolants can contain large axioms.
For SNOMED CT, we generated100random signatures with3 000concept names and20role names inΣ∩sig(·). For17of those signatures,NUIfailed to compute an instanceΣ-interpolant. For the remaining83signatures, Figure 4 shows the number of interpolants andΣ∩sig(·)-modules (vertical axis) of a given size (horizontal axis). The size of theΣ∩sig(·)-modules lies between125K and150K symbols, which is about3%
of SNOMED CT.48.19%of instanceΣ-interpolants are smaller than the corresponding modules. However,10instanceΣ-interpolants contain more than one million symbols and the largest instanceΣ-interpolant is more than11times larger than SNOMED CT.
For NCI, we computed instanceΣ-interpolants andΣ∩sig(·)-modules wrt. random signatures with7 000concept names and20role names inΣ∩sig(·).NUIsucceeded
Fig. 5.Size distribution of CEL-modules and instanceΣ-interpolants of NCI
to compute interpolants for97of100signatures, each within25min. Figure 5 shows the size distribution of the successfully computed interpolants and the corresponding modules. The size of theΣ∩sig(·)-modules lies between140K and160K symbols, which is about22.5%of NCI.74.47%of the instanceΣ-interpolants are smaller than the corresponding modules. On the other hand,18instanceΣ-interpolants consist of more than400K symbols and the largest instanceΣ-interpolant is more than12times larger than NCI.
Forgetting with disjunction: All failures in Table 1 are due to the fact that PΣ(A)is too large. Indeed, if we admit disjunction and considerELran,t, thenNUIsucceeds to computeallΣ-interpolants from Table 1, each within15min. Moreover, for NCI, no signature for whichNUIfails has been detected. For SNOMED CT, however,NUIstill typically fails for|Σ∩sig(·)| ≥30 000.
6 Discussion
The notion of forgetting in DL ontologies has recently been investigated in a number of research papers. [9, 19] consider forgetting in DL-Lite and [3] investigate in how far for- getting in DLs can be reduced to forgetting in logic programs. [6], on which this paper is based, proposes forgetting for acyclicEL-terminologies and concept inclusions.
The main novel contributions of this paper are (i) the first algorithms with exper- imental results indicating the practical feasibility of forgetting in DL-terminologies and (ii) the first systematic analysis of the distinct languages required to axiomatize Σ-interpolants for distinct query languages. Many open problems remain; e.g., we con- juncture thatΣ-interpolants ofELHr-terminologies (and possibly even TBoxes) exist in the languages introduced whenever they exist in FO. Such a result would provide further justification for those languages. Secondly, it would be of interest to prove de- cidability (and complexity) of the decision problem whether there exists aΣ-interpolant for a givenELHr-terminology (TBox). Note that our acyclicity conditions are sufficient but not necessary for the existence ofΣ-interpolants.
References
1. F. Baader, S. Brandt, and C. Lutz. Pushing the EL-envelope further. InProceedings of OWLED DC, 2008.
2. B. Cuenca Grau, I. Horrocks, Y. Kazakov, and U. Sattler. Just the right amount: Extracting modules from ontologies. InProceedings of WWW’07, pages 717–727, 2007.
3. T. Eiter, G. Ianni, R. Schindlauer, H. Tompits, and K. Wang. Forgetting in managing rules and ontologies. InWeb Intelligence, pages 411–419, 2006.
4. T. Eiter and K. Wang. Semantic forgetting in answer set programming.Artificial Intelligence, 172(14):1644–1672, 2008.
5. B. Konev, C. Lutz, D. Walther, and F. Wolter. Semantic modularity and module extraction in description logic. InProceedings of ECAI’08, pages 55–59, 2008.
6. B. Konev, D. Walther, and F. Wolter. The logical difference problem for description logic terminologies. InProceedings of IJCAR’08, pages 259–274, 2008.
7. B. Konev, D. Walther, and F. Wolter. Forgetting and uniform interpolation in large-scale description logic terminologies. InProceedings of IJCAI’09, 2009. To appear.
8. B. Konev, D. Walther, and F. Wolter. Forgetting and uniform interpolation in large-scale description logic terminologies. Technical Report ULCS-09-006, University of Liverpool, 2009.
9. R. Kontchakov, F. Wolter, and M. Zakharyaschev. Can you tell the difference between DL- lite ontologies. InProceedings of KR’08, pages 285–295, 2008.
10. J. Lang, P. Liberatore, and P. Marquis. Propositional independence: formula-variable inde- pendence and forgetting.Journal of Artificial Intelligence Research, 18:391–443, 2003.
11. C. Lutz, D. Walther, and F. Wolter. Conservative extensions in expressive description logics.
InProceedings of IJCAI’07, pages 453–458, 2007.
12. C. Lutz and F. Wolter. Conservative extensions in the lightweight description logic EL. In Proceedings of CADE’07, pages 84–99, 2007.
13. C. Lutz and F. Wolter. Deciding inseparability and conservative extensions in the description logic EL.Journal of Symbolic Computation, 2009. To appear.
14. R. Reiter and F. Lin. Forget it! InProceedings of AAAI Fall Symposium on Relevance, pages 154–159, 1994.
15. R. Rosati. On conjunctive query answering in EL. InProceedings of DL’07, 2007.
16. N. Sioutos, S. de Coronado, M. Haber, F. Hartel, W. Shaiu, and L. Wright. NCI thesaurus:
a semantic model integrating cancer-related clinical and molecular information. Journal of Biomedical Informatics, 40(1):30–43, 2006.
17. K. Spackman. Managing clinical terminology hierarchies using algorithmic calculation of subsumption: Experience with SNOMED-RT. 2000.
18. A. Visser. Uniform interpolation and layered bisimulation. In G¨odel ’96 (Brno, 1996), volume 6 ofLecture Notes Logic. Springer Verlag, 1996.
19. Z. Wang, K. Wang, R. Topor, and J. Z. Pan. Forgetting in DL-Lite. In Proceedings of ESWC’08, 2008.