Importing Ontologies with Hidden Content

(1)

Importing Ontologies with Hidden Content

Bernardo Cuenca Grau and Boris Motik Computing Laboratory

University of Oxford, UK

1 Introduction

The Web Ontology Language (OWL) and its revision OWL 2 are widely used ontology languages whose formal underpinnings are given by description logics (DLs). OWL ontologies are used, for example, in several countries to describe electronic patient records (EPR). Patients’ data typically involves descriptions of human anatomy, medical conditions, drugs, and so on. These domains have been described in well-established reference ontologies such SNOMED-CT and GALEN. In order to save resources, increase interoperability between applications, and rely on experts’ knowledge, an EPR application should preferably reuse these reference ontologies. For example, assume that a reference ontology Kh describes concepts such as the “ventricular septum defect.” An EPR application might reuse the concepts and roles fromKhto define its own ontologyKv

of concepts such as “patients having a ventricular septum defect.” It is gener- ally accepted that ontology reuse should be modular—that is, the axioms ofKv

should not affect the meaning of the symbols reused from Kh [1, 2].

To enable reuse, OWL allows Kv to import Kh. OWL reasoners deal with imports by merging the axioms of the two ontologies; thus, to process K_v∪ K_h, an EPR application would require physical access to the axioms of K_h. The vendor of K_h, however, might be reluctant to distribute the axioms of K_h, as doing this might allow competitors to plagiarizeKh. Moreover,Khmight contain information that is sensitive from a privacy point of view. Finally, the vendor of Khmight impose different costs on parts ofKh. To reflect this situation, we call the ontologyKh hidden and, by analogy,Kv visible.

To enable reuse without providing physical access to the axioms,Khcould be made accessible via an oracle (i.e., a limited query interface), thus allowingKv

to import Kh “by query.” In this paper, we study import-by-query algorithms, which can solve certain reasoning tasks onKv∪ Khby accessing onlyKvand the oracle. We focus on schema reasoning problems, such as concept subsumption and satisfiability; this is in contrast to the information integration [3] and peer- to-peer [4] scenarios, which focus on the reuse of data.

In our recent work [5], we studied the import-by-query problem when the oracle query language is concept satisfiability (or, equivalently, concept subsumption). We showed that no import-by-query algorithm exists even ifK_v and K_h are expressed in the lightweight DL EL [6]. This negative result holds if Kv reuses an atomic role from Kh, so we studied the case whenKv reuses only atomic concepts fromKhin a modular way [2]. We presented an import-by-query

(2)

algorithm for the case when the aforementioned assumptions are satisfied and K_v and K_h are expressed in SROIQ[7] and SRIQ [8], respectively. We also proposed an algorithm that is better suited for practice for the case where K_h is expressed in a Horn DL. Finally, we extended these results to the case when Kv reuses atomic roles fromKhin a syntactically restricted way.

The conditions in [5] on the usage of imported roles, however, can be limiting in practice, so in this paper we investigate ways of relaxing them. In particular, we study a different oracle query language that is based on ABox query answering, rather than concept subsumption or concept satisfiability checking. We present an import-by-query algorithm that uses such a query language and that is applicable ifKvandKhare expressed inELand the only restriction is thatKv

imports concepts and roles from Kh in a modular way. We also establish links between this result and the negative result from [5]. In particular, we show that, if the axioms ofKv involving the roles imported fromKh are weakly acyclic [9], an import-by-query algorithm based on concept subsumption exists.

2 Preliminaries

In this section, we present an overview of the DL EL[6] and a suitable reasoning algorithm. For simplicity, we restrict ourselves to the basic version of EL;

however, our results can be easily extended to EL++. We also recapitulate the notion of locality [2], which ensures modular ontology reuse. Finally, we define a restricted form of ELaxioms that guarantees locality.

2.1 The Description Logic EL

The syntax of EL is defined w.r.t. a signature, which is the union of disjoint countable sets ofatomic concepts, atomic roles, andindividuals. The set ofEL concepts is the smallest set containing >, ⊥, A, C₁uC₂, and ∃r.C, for A an atomic concept,C,C1, and C2 ELconcepts, andran atomic role.

Aconcept inclusion(GCI) has the formC1vC2forC1andC2ELconcepts, and aconcept equivalence C1≡C2is an abbreviation forC1vC2andC2vC1. A TBox T is a finite set of GCIs. An assertion has the form C(a) or r(a, b), for C an ELconcept, r an atomic role, and a and b individuals. An axiom is either a GCI or an assertion. An ABox A is a finite set of assertions. An EL knowledge base is a pair K=hT,Aiwhere T is a TBox andAis an ABox. For αa concept, an axiom, or a set of axioms,sig(α) is thesignature ofα—that is, the set of atomic concepts, roles, and individuals occurring inα.

The semantics ofEL is given by means of interpretations. The definitions of an ELinterpretation I= (4^I,·^I), satisfaction of an axiom, TBox, ABox, or KBS in an interpretationI (writtenI|=S), entailment of an axiom αfrom S (writtenS |=α), and concept subsumption (writtenK |=CvD), are standard and can be found in [6].ClassificationofKis the problem of determining whether K |=CvD for eachC, D∈sig(K)∪ {>,⊥}. For each satisfiableELknowledge base K =hT,Ai, it is well known that hT,Ai |= C vD iff hT,∅i |=C vD;

hence, classification is usually considered w.r.t. a TBox only.

(3)

Table 1.ELRules

R1

If ∃r.C(x)∈ Ai andr(x, xC)∈ A/ i

then Ai+1:=Ai∪ {r(x, xC)}.

R2

If 1. A1u. . .uAku ∃r1.B1u. . .u ∃rm.BmvC∈ T withk≥0 andn≥0, and 2. Aicontains individualsx, y1, . . . , ymsuch that

2.1{Aj(x)|1≤j≤k} ∪ {rj(x, yj), Bj(yj)|1≤j≤m} ⊆ Ai, and 2.2C(x)∈ A/ i

then Ai+1:=Ai∪ {C(x)}.

2.2 Reasoning with EL Knowledge Bases

We next present two algorithms for reasoning with ELKBs. First, we present the algorithm sat(T,A) that can be used to compute ABox assertions entailed byT ∪ A. Then, we show how to usesat(T,A) to obtain theEL classification algorithmel(T). These algorithms are similar to the one in [10] and can be seen as a minor variation of the algorithm in [6].

LetΓ be a signature. Then,BC(Γ) is the set of basic concepts forΓ, which is the smallest set of EL concepts containing >, ⊥, all atomic concepts in Γ, and all concepts of the form∃r.C such thatr∈Γ andC∈Γ∪ {>,⊥}.¹AnEL TBoxT is in normal form if every GCIα∈ T is of the form

C1u. . .uCn vD (1)

with n ≥ 1, and where Ci and D are basic concepts for sig(T). An ELABox A is innormal form if, in each assertion C(a)∈ A, the concept C is basic for sig(A). AnELKBK=hT,Aiis innormal form if bothT andAare in normal form. Given an ABox Ain normal form and Γ ⊆sig(A), we denote withA|Γ

the set of assertions C(a) andr(a, b) inAsuch thatC∈BC(Γ) andr∈Γ. For each conceptC∈sig(T), letxCbe an individual uniquely associated with C. The algorithm sat(T,A) accepts an ELTBoxT and ABox Aand produces a sequenceA0, . . .Anof ABoxes, called arun, s.t. the following conditions hold:

1. A0:=A ∪ {C(xC)|C∈sig(T)}.

2. Ai is obtained fromAi−1by applying a rule from Table 1 for each 1≤i≤n.

3. No rule from Table 1 is applicable to An.

For eachT andA, the ABoxAn is unique across all runs ofsat(T,A), so we often writesat(T,A) =An. Furthermore, by modifying slightly the proofs from [6] and [10], it is straightforward to show that the following properties hold:

– P1: For each TBoxT, ABoxA, basic concept C forsig(T), and each indi- vidualxin A, we haveC(x)∈sat(T,A) if and only ifT ∪ A |=C(x).

– P2: For all TBoxesT andT⁰, and all ABoxesAandA⁰,

• A ⊆sat(T ∪ T⁰,∅) impliessat(T,A)⊆sat(T ∪ T⁰,∅), and

• A⁰⊆sat(T,A) impliessat(T,A ∪ A⁰)⊆sat(T,A).

1 Note that this notion of basic concept differs from the one introduced in [6].

(4)

The algorithmel(T) accepts a TBox T and returnsC vD for each pair of conceptsCandDfromsig(T)∪ {>,⊥}such thatD(x_C)∈sat(T,∅). Properties P1and P2straightforwardly imply thatCvD∈el(T) iffT |=CvD.

2.3 Locality

When a TBoxTvreuses a TBoxTh, it is commonly accepted [1, 2] thatTvshould not affect the meaning of the symbols reused from Th—that is, Tv∪ Th|=α should implyTh|=αfor each axiomαcontaining only the reused symbols. This is guaranteed ifTv is local w.r.t. the set of concepts and rolesΓ imported from Th[2]. More precisely, forΓ a signature such that{>,⊥} ⊆Γ, a TBoxT islocal w.r.t.Γ if, for each interpretationI such that X^I =∅for each atomic concept and each atomic roleX /∈Γ, we haveI|=T.

AnELTBox T in normal form is safe forΓ ifsig(C1u. . .uCn)6⊆Γ for each axiom of the form (1) inT. It is easy to see thatT is then local forΓ.

3 Importing Ontologies by Query

To illustrate the notion of import-by-query, Table 2 shows a TBox Th whose axioms are to be kept hidden, but that is reused in a visible TBoxTv. The TBox Th provides concepts describing the structure of organs such as Heart, and medical conditions associated to them such as CHD(congenital heart defect), VSD (ventricular septum defect), and AS (aortic stenosis). Furthermore, the rolepartrelates organs with their parts, andcondrelates organs with medical conditions. The former role is used to define concepts such asHeart (an organ one of whose parts is the tricuspid valve); the latter is used to define concepts such asCHD Heart(a heart with a congenital heart disorder) andVSD Heart (a heart with a ventricular septal defect). The shared symbols ofTh are written in bold font. The TBoxTh might also contain nonshared symbols; however, for simplicity, we do not show any axioms involving such symbols. The TBox Tv

provides the concept Pat representing patients, and it defines types of patients by relating the organs from Th with the patients using the hasOrgan role. In addition,Tv extends the list of defects inTh byEA(Ebstein’s anomaly), which is a congenital heart defect in which the opening of the tricuspid valve is displaced (axiomsδ₇andδ₈), and defines a patient with tricuspid valve disease (TVD Pat) as a patient with a heart that has a congenital heart defect and an abnormal tricuspid valve (δ₉). The symbols private to Tv are written in italic font.

As already mentioned, we assume thatTv is local w.r.t. the imported symbols. For example, δ1 is local w.r.t. {CHD Heart} because δ1 is satisfied in any interpretation that interprets the nonshared symbols as∅. Note also that, according to the definition in Section 2,Tv is safe for the imported signature.

We next introduce two types oforacles, which are responsible for advertising the shared signatureΓ ofThand answering queries w.r.t.Th.

(5)

Table 2.Example Knowledge Bases Hidden Knowledge BaseTh

γ1CHD Heart≡Heartu ∃cond.CHD γ2 VSD Heart≡Heartu ∃cond.VSD

γ3 VSDvCHD

γ4 ASvCHD

γ5 HeartvOrganu ∃part.Tric Valve Visible Knowledge BaseTv

δ1 CHD Pat≡Patu ∃hasOrgan.CHD Heart δ2 VSD Pat≡Patu ∃hasOrgan.VSD Heart

δ3 AS Pat≡Patu ∃hasOrgan.(Heartu ∃cond.AS) δ4 EA Pat≡Patu ∃hasOrgan.EA Heart

δ5 EAvCHD

δ6 EA Heart≡Heartu ∃cond.EA δ7Heartu ∃cond.EAv ∃part.Ab Tric Valve

δ8 Ab Tric Valve≡Tric Valveu ∃part.Displ Opening

δ9 TVD Pat≡Patu ∃hasOrgan.(CHD Heartupart.Ab Tric Valve)

Definition 1. Let T be an EL TBox and Γ be s.t. {>,⊥} ⊆Γ ⊆sig(T). A subsumptionoracleΩ_T^s_,Γ is a function that, for each pair ofELconceptsCand D with sig(C)⊆Γ andsig(D)⊆Γ, returns a Boolean value as follows:

Ω^s_T_,Γ(C, D) =

t ifT |=CvD f otherwise

An ABox query oracle Ω_T^a_,Γ is a function that, for each normalized EL ABox A with sig(A)⊆Γ, a concept C ∈BC(Γ), and an individual xoccurring inA, returns a Boolean value as follows:

Ω^a_T_,Γ(A, C, x) =

t ifhT,Ai |=C(x) f otherwise

Without any further qualification, the generic termoracle refers either to the subsumption of the ABox query oracle.

A subsumption oracle is able to answer subsumption queries between (possi- bly complex)ELconcepts, and it is analogous to the concept satisfiability oracle we considered in [5]. Concept subsumption is available as a reasoning service in all DL reasoners known to us, so it provides us with a natural oracle query language. Our results from [5], however, suggest that the way in which the roles from Th are used inTv must be restricted in import-by-query algorithms based on subsumption oracles; we summarize these results in Section 4.

As we show in Section 5, the use of ABox query oracles allows us to overcome these limitations. Intuitively, an ABox query accepts an ABox A containing only symbols inΓ and “completes it” with the (basic) concepts overΓ that are entailed byA ∪ T_h. Roughly speaking, the ABox represents the information that has already been deduced; when “handed over” to the oracle, the oracle then completes the ABox with the relevant information entailed byTh.

(6)

An import-by-query (classification) algorithm checks the subsumption rela- tions between the atomic concepts insig(T_v) w.r.t.T_v∪ T_h by using an oracle.

Definition 2. An import-by-query algorithm takes as input an EL TBox Tv

and an oracle Ω_T_h_,Γ with sig(Tv)∩sig(Th)⊆Γ, and it returns C vD for each C andD such that {C, D} ⊆sig(Tv)∪ {>,⊥} andTv∪ Th|=CvD.

4 Import-by-Query and Subsumption Oracles

We next summarize our results from [5]. We first show that no general import- by-query algorithm based on subsumption oracles exists.

Theorem 3. No import-by-query algorithm based on subsumption oracles exists, if Tv andTh are in EL,Γ contains only one atomic role, andTv is local inΓ. Proof. Assume that an import-by-query algorithm exists that terminates on all inputs, and considerT_vas in (2) withΓ ={r}. Clearly,T_vis local w.r.t.Γ. Since the algorithm terminates, the number of questions posed to any subsumption oracle is bounded by some integerm, and the quantifier depth of each concept C passed to the oracle is bounded by an integern, where bothmandndepend only onΓ andTv. LetT_h¹ andT_h²be as in (3) and (4), respectively.

Tv ={Av ∃r.A} (2)

T_h¹=∅ (3)

T_h²={ ∃r. . . .∃r

| {z }

.> v ⊥ }

n+ 1 times (4)

For each C with sig(C)⊆Γ and quantifier depth at mostn , T_h¹|=Cv ⊥ iff T_h²|=Cv ⊥, soΩ_T^s1

h,Γ(C,⊥) =Ω_T^s2

h,Γ(C,⊥). When applied to Tv and Ω_T^s1 h,Γ, the algorithm returns the same value as when applied to Tv and Ω_T^s2

h,Γ. Since Tv∪ T_h¹6|=Av ⊥butTv∪ T_h²|=Av ⊥, it does not satisfy Definition 2. ut This negative result relies on the presence of an atomic rolerin the reused signature Γ and the fact that r is used in Tv in a “cyclic” axiom A v ∃r.A.

To circumvent this problem, we studied in [5] the case where Γ contains only atomic concepts and proposed an import-by-query algorithm that applies even whenTv andThare given in a very expressive DL.

In [5] we also studied the case where both concepts and roles are imported, but with the condition thatsig(C)⊆Γ for each concept C appearing in Tv in the scope of a quantifier overrfromΓ. For example, ifr∈Γ, thenTvis allowed contain the concept∃r.Aonly ifA∈Γ. The algorithm in [5] for this case applies to very expressive DLs. In our example, the conditions in [5] allow us to write axiomsδ₁–δ₅, which together withT_hallow us to concludeVSD Pat vCHD Pat (usingδ₁, δ₂ andδ₅) andAS Pat vCHD Pat (usingδ₁ andδ₃).

In order to express axioms such asδ5–δ9, we next investigate ways of relaxing the conditions from [5]. In Section 5, we study the design of import-by-query

(7)

Table 3.AdditionalELRule for an ABox Query Oracle

R3

If for some individualxC inAi and conceptD∈BC(Γ), Ω^a_T_h_,Γ(Ai|Γ, D, xC) =tandD(xC)∈ A/ i

thenAi+1:=Ai∪ {D(xC)}.

algorithms based on ABox query oracles. We show that, ifTv andThare inEL, no restrictions other than locality are needed to ensure existence of an import- by-query algorithm. This investigation will also provide us with insight on how to relax the syntactic restrictions from [5] when using subsumption oracles.

5 Import-by-Query with ABox Query Oracles

We next present the import-by-query algorithm ibq^a(Tv, Ω^a_T

h,Γ), which accepts a normalizedELTBoxTv and an ABox query oracleΩ_T^a

h,Γ s.t.Tv is safe forΓ. Let sat^a(Tv,A, Ω_T^a_h_,Γ) be the same as sat(Tv,A) (see Section 2), with the difference that rule R3 shown Table 3 is applied alongside rules R1 and R2 from Table 1. We next prove certain properties of this algorithm.

Lemma 4. Let Aâ =satâ(Tv,∅, Ω_Tâ_h_,Γ) and Aêl =sat(Tv∪ T_h⁰,∅), where T_h⁰ is the result of transforming Th into normal form. The following properties hold:

– Soundness:A^a ⊆ A^el;

– Completeness: For each C ∈ sig(Tv) and D ∈ BC(sig(Tv)), we have that D(xC)∈ A^elimplies D(xC)∈ A^a.

Proof (Soundness).LetA0, . . . ,An withAn =Aâ be a run ofsatâ(Tv,∅, Ω_Tâ

h,Γ).

We show by induction on the rule applications thatA_i ⊆ A^elfor each 0≤i≤n.

The induction base (i= 0) follows from the initialization conditions of both algorithms. Assume now thatA_i−1⊆ A^eland letAibe obtained fromA_i−1by a rule application. If either R1 or R2 is applied, propertyP2from Section 2 implies that Ai⊆ A^el. Assume that R3 extendsA_i−1withD(xC). Then,C∈Γ;D∈BC(Γ);

and, by the induction hypothesis,A_i−1|Γ ⊆ A^el. Also,Ω_T^a

h,Γ(A_i−1|Γ, D, xC) =t, which impliesT_h⁰∪ A_i−1|Γ |=D(xC). But then, propertyP1from Section 2 implies that D(xC) ∈ sat(T_h⁰,Ai−1|Γ). Since Ai−1|Γ ⊆ Aêl, by P2 we have that sat(T_h⁰,Ai−1|Γ)⊆ Aêl, which implies our claim. ut Proof (Completeness).LetA0, . . . ,AnwithAn=Aêlbe a run ofsat(Tv∪ T_h⁰,∅).

ForAj occurring in the run andΣa signature, let

AjkΣ ={D(xC)∈ Aj| {C, D} ⊆BC(Σ)} ∪ {r(xC, xD)∈ Aj| {r, C, D} ⊆Σ}.

Completeness follows immediately from statement 2 of the following claim.

CLAIM: The following conditions hold for eachAj with 1≤j≤n:

1. Aj ⊆ A^a∪sat(T_h⁰,A^a|Γ).

(8)

2. A_jk_sig(T_v₎⊆ A^a. 3. IfD(x_C)∈ A_j, then

(a) C∈sig(T_h⁰) impliesD∈BC(sig(T_h⁰)), and (b) C∈sig(Tv)\Γ impliesD∈sig(Tv).

4. Ifr(x_C, x_D)∈ Aj, then

(a) r∈sig(Tv)\Γ impliesC∈sig(Tv)\Γ,

(b) r∈Γ andD∈sig(Tv)\Γ implyC∈sig(Tv)\Γ, and (c) r∈sig(Tv) andC∈sig(Tv)\Γ implyD∈sig(Tv).

The proof is by induction on the rule applications. For the induction base (j= 0), we haveA0={C(xC)|C∈sig(Tv∪ T_h⁰)}. Statements 1–3 hold straightforwardly, and statement 4 is vacuously true. Assume now that statements 1–4 hold forAj−1 and consider an application of a rule that derivesAj.

Assume that R1 is applied to∃r.D(xC)∈ Aj−1and that it derivesr(x_C, x_D).

By the induction hypothesis, ∃r.D(xC)∈ A^a or ∃r.D(xC)∈sat(T_h⁰,A^a|Γ). If

∃r.D(xC)∈ Aâ, then r(x_C, x_D) ∈ Aâ because R1 is not applicable to Aâ, so statement 1 holds. If∃r.D(x_C)∈sat(T_h⁰,Aâ|_Γ), thenr(x_C, x_D)∈sat(T_h⁰,Aâ|_Γ) because R1 is not applicable tosat(T_h⁰,Aâ|_Γ), so statement 1 also holds. IfC,D, andrare all insig(Tv), by the induction hypothesis∃r.D(xC)∈ Aâ; since R1 is not applicable to Aâ, we haver(xC, xD)∈ Aâ, so statement 2 holds. Statement 3 holds vacuously. Note that the contrapositive of statement 3a is as follows (*):

r∈sig(Tv)\Γ orD ∈sig(Tv)\Γ imply C∈sig(Tv)\Γ. Now (*) immediately implies statements 4a and 4b. Finally, statement 3b implies statement 4c.

Assume now that R2 is applied to an axiom α ∈ Tv ∪ T_h⁰ of the form A1u. . .uAku ∃r1.B1u. . .u ∃rm.BmvDand that it derivesD(xC). Then, in- dividualsyE₁, . . . , yE_minAj−1exist such thatAi(xC)∈ Aj−1for each 1≤i≤k, and {r`(xC, yE_`), B`(yE_`)} ⊆ Aj−1 for each 1≤`≤m. Statement 4 holds vacuously, so we next prove 1–3, depending on whetherαbelongs toT_h⁰ or Tv.

– α ∈ T_h⁰. Statement 3 holds trivially. Consider each assertion Ai(xC) with 1≤i≤n. By the induction hypothesis, we haveAi(xC)∈ A^a∪sat(T_h⁰,A^a|Γ).

IfAi(xC)∈ Aâ, sinceα∈ T_h⁰, we haveAi∈Γ, soAi(xC)∈ Aâ|Γ. Thus, we haveAi(xC)∈sat(T_h⁰,Aâ|Γ). By a completely analogous argument, we have {r`(x_C, y_E_`), B_`(y_E_`)} ⊆sat(T_h⁰,Aâ|Γ) for each 1≤`≤mas well. But then, since R2 is not applicable tosat(T_h⁰,Aâ|Γ), we haveD(x_C)∈sat(T_h⁰,Aâ|Γ), so statement 1 holds. Statement 2 is vacuously true ifC6∈Γ orD6∈BC(Γ).

Otherwise, by statement 1 and propertyP1, we haveT_h⁰∪ A^a|_Γ |=D(x_C);

since R3 is not applicable toA^a, we haveD(x_C)∈ A^a, so statement 2 holds.

– α∈ T_v. Note thatT_v is safe forΓ, so we have these possibilities:

• A_i ∈/ Γ for some 1≤i ≤k. By the contrapositive of statement 3a, we haveC∈sig(T_v)\Γ.

• r_`6∈Γ for some 1≤`≤m. By statement 4a, we haveC∈sig(T_v)\Γ.

• B_` ∈/ Γ and r_` ∈ Γ for some 1 ≤ ` ≤ m. By the contrapositive of statement 3a,y_E_` ∈sig(T_v)\Γ; however, by statement 4b,C∈sig(T_v)\Γ. Since C ∈ sig(T_v)\Γ, statement 4c implies E_` ∈ sig(T_v). But then, by statement 2, allA_i(x_C),r_`(x_C, y_E_`), andB_`(y_E_`) are inAâ. Since R2 is not applicable toAâ,D(xC)∈ Aâ, which implies statements 1 and 2. Statements 3a and 3b respectively hold sinceC6∈sig(T_h⁰) andD∈BC(sig(Tv)). ut

(9)

The algorithm ibq^a(T_v, Ω^a_T

h,Γ) returns C v D for each pair of concepts C andD fromsig(T)∪ {>,⊥}such thatD(x_C)∈sat^a(T,∅, Ω_T^a

h,Γ).

Theorem 5. The algorithm ibq^a(Tv, Ω^a_T

h,Γ) is an import-by-query algorithm, and it can be implemented such that it runs in time polynomial in the size of Tv

with a polynomial number of calls toΩ_T^a

h,Γ.

Proof. The size ofBC(sig(Tv)) is quadratic in the size ofTv. Each application of R1, R2, or R3 adds an assertion of the formC(a) orr(a, b) forC∈BC(sig(Tv)).

Since no rule removes assertions from A, the total number of rule applications is polynomial. Thus, the algorithm can be implemented such that it runs in polynomial time with a polynomial number of calls to the oracle. Since T_h⁰ is obtained fromThthrough normalization, for all conceptsCandDwith{C, D} ⊆ sig(Th)∪sig(Tv)∪ {>,⊥}we haveTv∪ Th|=CvDiffTv∪ T_h⁰|=CvD. LetAâ andAêlbe as in Lemma 4. Ifibqâ(Tv, Ω_Tâ

h,Γ) returnsCvD, thenD(xC)∈ A^a; the soundness property of Lemma 4 implies D(xC) ∈ A^el; thus, el(Tv ∪ T_h⁰) returnsCvD; hence,Tv∪ T_h⁰|=CvD; finally,Tv∪ Th|=CvD. The converse direction holds analogously by the completeness property of Lemma 4. ut

6 Weakly Acyclic Knowledge Bases

We next present an import-by-query algorithm based on subsumption oracles where Tv needs to satisfy weaker conditions than those in [5]. Our results rely on the notion ofweak acyclicity borrowed from database dependency theory [9].

This condition prohibits cyclic existential quantification over the roles imported fromTh, which invalidates the proof of Theorem 3. In our example, weak acyclicity allow us to represent all axiomsδ1–δ9and hence to entailEA Pat vCHD Pat (usingδ4–δ6) andEA Pat vTVD Pat (usingδ4–δ7 andδ9).

6.1 Weak Acyclicity

Let [A] =A forA an atomic concept, and let [∃r.D] =r.

Definition 6. Let Γ be a signature such that {>,⊥} ⊆Γ, and let T be anEL TBox in normal form. The dependency graph for T w.r.t. Γ is the smallest graph that contains a node w_X for each symbol X ∈ sig(T) and the following edges for each axiom C1u. . .uCnvD inT and each 1≤i≤n:

– an unlabeled edgehw_[C_i_], w_[D]i, and

– another edgehw_[C_i_], w_[E]ilabeled with ? ifD of the form ∃r.E withr∈Γ. T is weakly acyclic w.r.t. Γ if, in each cycle in the dependency graph for T w.r.t.Γ, all edges are unlabeled.

It can be readily checked using Definition 6 that our example TBox Tv in Table 2 is weakly acyclic w.r.t. the imported signature.

(10)

Table 4.AdditionalELRule for oracleΩ^s

R3’

If for some individualxC inAi and conceptD∈BC(Γ), ΩT^s_h,Γ(con(xC,Ai, Γ), D) =tandD(xC)∈ A/ i

thenAi+1:=Ai∪D(xC).

LetΓ be a signature andAan ABox occurring in a run ofsat^a(Tv,∅, Ω_T^a_h_,Γ).

Then A contains a harmful cycle for Γ if assertions r_i(x_C_i−1, x_C_i) ∈ A with 1≤i≤nexist such thatx_C_n=x_C₀,C_i∈sig(Tv) for each 0≤i≤n, andr_i ∈Γ for each 1 ≤i≤n. As we show next, if Tv is weakly acyclic w.r.t. Γ, then no ABox occurring in a run ofsat(T_v,∅, Ω_T^a

h,Γ) contains a cycle harmful forΓ. Lemma 7. Let Tv be weakly acyclic w.r.t. Γ, and let Aâ = satâ(Tv,∅, Ωâ_T_h_,Γ).

Then, A^a does not contain a cycle that is harmful forΓ.

Proof (Sketch).LetGbe the dependency graph forTvw.r.t.Γ, andA0, . . . ,Ana run ofsat^a(Tv,∅, Ω_T^a

h,Γ) such thatAn=A^a. The following claim can be straightforwardly shown by induction on the number of rule applications:

CLAIM: The following properties hold for eachAi with 1≤i≤n:

1. if D(x_C) ∈ Ai and {C, D} ⊆BC(sig(Tv)\Γ), then w_[D] is reachable inG fromw_[C]; in addition, ifD is of the form ∃r.E, then w_[E] is reachable inG fromw_[C] via a path containing a labeled edge;

2. if r(xC, xD)∈ Ai, {C, D} ⊆sig(Tv)\Γ, andr∈Γ, thenw[D] is reachable fromw_[C] via a path containing a labeled edge.

The lemma follows from statement 2 and the fact thatTv is weakly acyclic. ut

6.2 An Import-by-Query Algorithm The algorithmibq^s(Tv, Ω_T^s

h,Γ) accepts a normalizedELTBoxTvthat is safe and weakly acyclic forΓ, and an oracleΩ_T^s

h,Γ. Letsat^s(Tv,A, Ω_T^a

h,Γ) be the same as sat(Tv,A) with the difference that rule R3’ in Table 4 is applied alongside rules R1 and R2. ForΣa signature andAan ABox not containing cycles harmful for Σ, the conceptcon(xC,A, Σ) used in rule R3’ is inductively defined as follows:

con(x_C,A, Σ) =





 d

D(xC)∈A|Σ

D ifC∈Σ

d

D(x_C)∈A|Σ

D u d

r(x_C,y_E)∈A|Σ

∃r.con(yE,A, Σ) otherwise

SinceAdoes not contain cycles harmful for Σ, induction is well-founded. Intu- itively,con(xC,A, Σ) represents the “rolled-up” part ofA|Σ reachable fromxC. The algorithm ibq^s(Tv, Ω^s_T

h,Γ) then returns C vD for each pair of conceptsC andD such that{C, D} ⊆sig(T)∪ {>,⊥}andD(xC)∈sat^s(T,∅, Ω_T^s

h,Γ).

Lemma 8. For each signature Γ, each EL TBox T_v that is safe and weakly acyclic for Γ, and each ELTBox Th such that sig(Tv)∩sig(Th)⊆Γ, we have sat^s(Tv,∅, Ω_T^s

h,Γ) =sat^a(Tv,∅, Ω_T^a

h,Γ).

(11)

Proof (Sketch).The lemma is a consequence of the following claim together with the fact that both algorithms are identical except for the rule R3’.

CLAIM: For each signature Γ, ABox A not containing cycles harmful for Γ, individualxinA, and conceptD∈BC(Γ), we have

Th∪ A|Γ |=D(x) if and only ifTh|=con(x,A, Γ)vD.

To prove this claim, letQ_xbe a fresh concept uniquely associated with each individualxinA|_Γ. LetT⁰ be a TBox containing the axiomQ_x_C vDfor each D(x_C)∈ A|_Γ, and Q_x_D v ∃r.Q_y_E for each r(x_D, y_E)∈ A|_Γ with D /∈Γ. It is easy to show thatTh∪ A|Γ |=D(x) iffD(x)∈el(T_h⁰∪ T⁰), forT_h⁰the normalized version ofTh. SinceA|Γ does not contain cycles harmful forΓ, by induction on the structure ofcon(x,A, Γ), we have thatcon(x,A, Γ) is obtained fromQxby unfolding the relevant conceptsQy with their definitions inT⁰. ut Theorem 9. The algorithmibq^s(Tv, Ω_T^s

h,Γ)is an import-by-query algorithm and it can be implemented such that it runs in time exponential in the size ofTv with a polynomial number of calls to Ω_T^s

h,Γ.

Proof. The algorithm is an import-by-query algorithm by Lemma 8 and Theorem 5. Finally, just like in the proof of Theorem 5, rules R1, R2, and R3’ can only be applied a polynomial number of times. The inductive definition ofcon(x,A, Γ), however, involves constructing a tree of polynomial depth and branching factor, the size of which can be at most exponential in the size ofA|_Γ. ut

7 Related and Future Work

In a P2P setting, [4] consider the problem of answering a query qover KBsK_v and K_h and mappings Mby reformulating q as queries that can be evaluated overKvandKhin isolation. The query reformulation algorithm accesses onlyKv

andM; thus,qcan be answered using an oracle forKh. In this setting, however, a satisfiableKh cannot affect the subsumption of concepts inKv (see [5] for an example); thus, the results in [4] are not applicable to schema reasoning.

Another possibility for the owner of Th to restrict access only to symbols in Γ is to publish a uniform interpolant T_h^Γ of Th w.r.t. Γ [11]—an ontology T_h^Γ such that sig(T_h^Γ)⊆ Γ, Th |= T_h^Γ and T_h^Γ |= C vD iff Th |=C vD for all concepts C andD withsig(C)∪sig(D)⊆Γ. PublishingT_h^Γ, however, may reveal more information aboutThthan what is strictly needed. For example, for Γ ={r, B}, Tv ={Av ∃r.B} and Th ={∃r.∃r.B vB}, the interpolantT_h^Γ is equal toTh; thus, publishingT_h^Γ reveals entire contents of the hidden ontology Th. In contrast, an import-by-query algorithm reveals only thatTh6|=∃r.Bv ⊥ and T_h 6|=∃r.BvB; hence it does not reveal the actual syntactic structure of T_h. Thus, enabling the reuse ofT_hthrough import-by-query preserves the hidden content ofT_h to a higher degree than if reuse were enabled by publishingT_h^Γ.

We are currently working on extending the results presented in this paper to DLs more expressive thanEL.

(12)

References

[1] Lutz, C., Walther, D., Wolter, F.: Conservative Extensions in Expressive Descrip- tion Logics. In: Proc. IJCAI. (2007) 453–458

[2] Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular Reuse of Ontolo- gies: Theory and Practice. Journal of Artificial Intelligence Research31(2008) 273–318

[3] Lenzerini, M.: Data Integration: A Theoretical Perspective. In: Proc. PODS.

(2002) 233–246

[4] Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: What to Ask to a Peer: Ontolgoy-based Query Reformulation. In: Proc. KR. (2004) 469–478

[5] Cuenca Grau, B., Motik, B., Kazakov, Y.: Import-by-Query: Ontology Reasoning under Access Limitations. In: Proc. IJCAI, AAAI Press (2009) To Appear.

[6] Baader, F., Brandt, S., Lutz, C.: Pushing theEL Envelope. In: Proc. IJCAI,.

(2005) 364–369

[7] Kutz, O., Horrocks, I., Sattler, U.: The Even More IrresistibleSROIQ. In: Proc.

KR. (2006) 68–78

[8] Horrocks, I., Kutz, O., Sattler, U.: The irresistibleSRIQ. In: Proc. of OWLED.

(2005)

[9] Kolaitis, P.G., Panttaja, J., Tan, W.C.: The complexity of data exchange. In:

PODS. (2006) 30–39

[10] Motik, B., Horrocks, I.: Individual Reuse in Description Logic Reasoning. In:

Proc. of IJCAR 2008. Volume 5195 of LNAI., Springer (2008) 242–258

[11] Konev, B., Walter, D., Wolter, F.: Forgetting and uniform interpolation in large- scale description logic terminologies. In: Proc. IJCAI, AAAI Press (2009) To Appear.