Small Datalog Query Rewritings for EL

(1)

Small Datalog Query Rewritings for EL

^?

Giorgio Stefanoni, Boris Motik, and Ian Horrocks Department of Computer Science, University of Oxford

1 Introduction

Description Logics are a key technology in data management scenarios such as Ontology-Based Data Access (OBDA), a paradigm in which a DL ontology is used to provide a conceptual view of the data [1]. An OBDA system transforms a conjunctive query over the ontology into a query over the data sources [2]. This transformation is independent of the data, so the OBDA approach can thus be used in settings where the data sources provide read-only access to the data, and where the data changes frequently.

Most existing OBDA systems are based on theDL-Litefamily oflightweight Description Logics [1], which is also the basis for the QL profile of the OWL 2 ontology language. Logics in this family have been designed to allow a conjunctive query posed over the ontology to be rewritten as a first order query over the data sources—that is, queries are first-order rewritable. The query rewriting procedure is independent of the data, and the resulting queries can be evaluated using highly scalable relational database technology. To achieve this, however, the expressive power of DL-Lite is very restrictive. This prevents the OBDA approach from being applied in the life science domain, where many ontologies use DLs from the EL family [3, 4]. This family provides the basis for the EL profile of OWL2, and many prominent ontologies, such as SNOMED-CT, were developed using this language.

The problem of answering conjunctive queries inELhas already been studied in the literature, and two orthogonal approaches have been proposed. First, Rosati proposed a pure query rewriting technique which transforms anELTBox T and a conjunctive query q into a Datalogprogram P_T,q [5]. Second, Lutz et al. introduced a “combined” approach [6, 7]. This technique first materializes certain facts entailed by the ontology in a precomputation step. Then, each user query is rewritten into a polynomial first-order query that, when evaluated over the materialized facts, computes the answers to the user’s query.

Unfortunately, these two approaches exhibit several shortcomings when applied in the context of OBDA. In particular, Rosati’s rewriting technique computes for each user query a freshDatalogprogram whose size depends on both the query and the terminology, which could be very inefficient when dealing with large scale ontologies. The approach by Lutz et al. produces smaller first order rewritings, but the use of materialization means that the technique is only applicable when the data sources provide read/write access to the data; furthermore, materialization can be inefficient if the data changes frequently.

?This work was supported by EPSRC and Alcatel-Lucent.

(2)

In this paper, we present a pure query rewriting technique to query answering inEL. Our approach reinterprets the combined approach proposed by Lutz and colleagues in terms ofDatalog. Our rewriting procedure consists of two distinct steps. The first step rewrites a TBoxT into aDatalogprogramP_T, whose size depends linearly on the size ofT. Then, at query time, the conjunctive queryq is rewritten into aDatalogqueryhQP, QCi, whose size depends polynomially onq. The two rewriting steps are such that, given an ABoxA, deciding whether QP(a1, . . . , ak) follows from P_T ∪QC ∪ A is equivalent to deciding whether ha1, . . . , akiis a certain answer toqover a knowledge basehT,Ai.

At last, we summarize our main contributions. First, our rewriting approach, unlike Rosati’s, separates the rewriting of the TBox and the query into two distinct steps, thus reducing inefficiency when dealing with large ontologies. Second, our technique does not require the materialization of entailed facts, hence our solution is in the spirit of OBDA and it avoids the problems associated with the materialization of large models. Finally, we set the stage for assessing the utility and the applicability toP_T ∪QC∪ A of optimizedDatalogevaluation techniques, such asmagic setsandSLG resolution [8, 9]. Indeed, heuristic-based evaluation strategies significantly reduce the number of facts needed to answer a query, thus potentially improving the performance of our rewriting approach.

In this paper we provide only proof sketches, and refer the reader to [10] for full proofs.

2 Preliminaries

Description Logic EL

LetNC,NR,NIbe pairwise disjoint infinite sets of atomic concepts, atomic roles, and individuals. Together, the setsNC,NR, andNI form thesignatureof anEL language. Whenever the distinction between atomic concepts and atomic roles is immaterial, we call an element ofNC∪NR apredicate. The set ofELconcept expressions is inductively defined starting from atomic concepts A ∈ NC and atomic rolesR∈NRaccording to the syntax rule:C→A|C1uC2| ∃R.C | >.

AnELTBox T is a finite set of concept inclusions of the form C vD; an ELABoxAis a finite set ofassertions of the formA(a) orR(a, b) with aand b individuals; and an EL knowledge base (KB) is a tuple K = hT,Ai, where T is an ELTBox andA is an ELABox. We denote withInd(A) the set of all individuals occurring in the ABox A. Furthermore, for E either a TBox or an ABox,Pred(E) is the set of all predicates occurring inE.

Semantics is given as usual in terms of first-order interpretationsI=h∆Î,·Îi, where∆Îis a nonemptydomain and·Î is aninterpretation function; please refer to [3] for details. In the following, we will extensively use the notion of an unraveling of an interpretation w.r.t. an ABox. Consider an interpretationIand an ABoxAover an arbitraryELsignature. ApathpinIw.r.t.Ais a nonempty finite sequencec1·R2·c2· · ·R_n−1·c_n−1·Rn·cn such thatc1∈ {aÎ |a∈Ind(A)}

and for all 1≤i≤n−1 we have thathci, ci+1i ∈R^I_i+1 forRi+1 ∈NR. We say

(3)

that a path phasdepth n and we write dep(p) =n; furthermore,tail(p) is the last domain element c_n in p. Letpaths_A(I) denote the set of all paths w.r.t. A occurring inI. The unravelingJ ofI w.r.t.Ais the following interpretation.

∆^J =paths_A(I) a^J =a^I

A^J ={p∈paths_A(I)|tail(p)∈A^I}

R^J ={ha^J, b^Ji |R(a, b)∈ A} ∪ {hp, p·R·ci | {p, p·R·c} ⊆paths_A(I)}

In this paper, we deal only with normalized ELTBoxes. Let A₁, A, and B be arbitrary concepts from N_C∪ {>}. We say that an EL TBox T is in normal form if each axiom inT is in one of the following forms:A vB, AuA₁ vB, A v ∃R.B, or ∃R.AvB. Given an arbitrary EL TBox T, we can compute a normalized TBoxT_norm ofT in linear time [3].

Querying EL KBs

LetNV be an infinite set ofvariablesdisjoint fromNI. Together,NV andNI form the set NT of terms. A first-order query q is a first-order formula constructed from the terms in NT and the predicates from NC ∪NR [8]. In general, we writeq=ψ(~x) to express thatqis the FO formulaψwhoseanswer variablesare

~

x={x1, . . . , xk}. A query withkanswer variables is ak-ary query. Aconjunctive query (CQ) is a FO query of the formq=∃~y.ψ(~x, ~y), whereψis a conjunction of unary atomsA(s) and binary atomsR(s, t) withsandtterms. The variables

~

y are thequantified variables ofq. In the following,avar(q) is the set of answer variables ofq, andqvar(q) is the set of quantified variables. Finally,N_V(q) is the set of all variables occurring in q, and N_T(q) is the set of all terms occurring in q. Let q =ψ(~x) be ak-ary FO query with ~x=hx1, . . . , x_ki and letI be an interpretation. We say that ak-ary tuple of individualsha₁, . . . , a_kiis an answer to qin I, writtenI |=q[a₁, . . . , a_k], if I satisfiesqunder the mapping πwhich sets π(x_i) = a_i for all 1 ≤ i ≤ k. We call π a match for q in I witnessing ha1, . . . , aki, written I |=^π q. We say that ha1, . . . , aki is a certain answer to q over K if I |= q[a1, . . . , ak], for all models I of K. We denote the set of all certain answers toq over K with cert(q,K). Rosati in [5] showed that deciding whether a tuple of individuals is a certain answer toqoverKisPtime-complete w.r.t. data-complexity (i.e., w.r.t. the size of the ABox);Ptime-complete w.r.t.

KB complexity (i.e., w.r.t. the size of K); and, NP-complete w.r.t. combined complexity (i.e., w.r.t. the size of bothK andq).

Datalog

LetNB be a nonempty set of built-in predicates [11]. Then, aDatalogrule r is an expression of the form

S(~u)←S1(~u1), . . . , Sn(~un), Bn+1(~un+1). . . , Bm(~um),

(4)

wheren, m≥0,{S, S₁, . . . , S_n} ⊆N_C∪N_R,{B_n+1, . . . , B_m} ⊆N_B, and~uand

~

u_iare tuples of terms of suitable length. A rule issafe if each variable occurring in~u∪~u_n+1∪. . .∪~u_malso occurs in~u₁∪. . .∪~u_n. AtomS(~u) is thehead of the rule, and atoms S1(~u1), . . . , Bm(~um) constitute the body of the rule. Whenever the body of a ruleris empty, we callrafact, and we write the rule asS(~u). A Datalogprogram P is a set of safe Datalogrules. Finally,sch(P) is the set of predicates occurring inP.

Next, we define the semantics of a Datalog program P using Herbrand interpretations [8]. The Herbrand Universe of P is the set of all individuals occurring in P. The Herbrand Base of P is the set of all facts that can be constructed from the predicates inNC∪NR and the individuals in the universe ofP. A Herbrand interpretation I ofP is a subset of the Herbrand Base ofP. Note thatIdoes not interpret built-in predicates. As usual, we assume that these predicates are evaluated over a fixed, possibly infinite Herbrand interpretation B[12]. ThenI is a model ofP w.r.t.Bif, for all the rulesrinP, we have that

I∪B|=∀~x(B_m(~u_n)∧. . .∧B_n+1(~u_n+1)∧S_n(~u_n)∧. . .∧S₁(~u₁)→S(~u)), where~xis a tuple consisting of all variables occurring in the rule. The semantics of a DatalogprogramP is defined as the minimal Herbrand interpretation I satisfyingP w.r.t.B, written MB(P). Whenever the program does not contain built-in predicates, we do not consider the interpretationBand we simply write M(P). The semantics of Datalogprograms can be defined also by means of a fixpoint construction. Then,TP is theimmediate consequence operatorthat maps instancesIoversch(P) to instances oversch(P) as follows. For each rulerinP, if there exists a matchπforS1(~u1)∧. . .∧Sn(~un)∧Bn+1(~un+1)∧. . .∧Bm(~um) inI∪B, thenS(a1, . . . , ak) is contained inTP(I) withai=π(ui) for eachui∈~u.

One can prove thatT_P has a minimum fixpoint T_P^ω such thatT_P^ω=M_B(P) [8].

Finally, aDatalogquery is a tuplehQP, Q_CiwhereQ_Pis a predicate symbol andQ_Cis aDatalogprogram. A tuple of individualsha1, . . . , a_kiis ananswer tohQ_P, Q_CioverDatalogprogramP ifP∪Q_C |=Q_P(a₁, . . . , a_k).

3 Datalog Rewriting for EL TBoxes

In this section, we show how to transform an EL TBox T into a Datalog programPT whose size depends linearly onT. The transformation is such that, for an arbitraryELABoxA, we can use the unraveling ofM(PT∪A) to compute the answers to conjunctive queries overhT,Ai. LetT be a TBox over an arbitrary EL signature. Intuitively, for each axiom α occurring in T, the program P_T contains a set of Datalogrules which encode the constraint imposed byα. To achieve this, we have to overcome two issues.

First,ELconcept inclusions of the formAv ∃R.B require the use of either existential quantifications or Skolem terms in rule heads; however, Datalog does not allow neither of the two. In order to solve this issue, we use a technique that has been introduced for representing canonical models of EL knowledge bases [3]. That is, for each atomic conceptBoccurring inT we introduce a fresh

(5)

Axiomα Set of rulesΘ(α)

AvB B(X)←A(X)

A1uA2vB B(X)←A1(X), A2(X)

∃R.AvB B(X)←R(X, Y), A(Y) Av ∃R.B R(X, oB)←A(X) B(oB)←A(X)

Fig. 1.Transformation ofELAxioms into Rules.

auxiliary individual oB, which represents the class of existentially quantified individuals of typeB. Then, for each axiom of the above form, the programPT

contains the following two rules:

R(X, oB)←A(X); B(oB)←A(X).

Second,ELallows for>to occur in concept expressions. Hence, we need to define inPT a unary predicate>, whose extension—given an ABoxA—coincides with the Herbrand universe of P_T ∪ A. To achieve this, we restrict our study to a subset of all ELABoxes. In particular, we consider only those ABoxes A such thatPred(A)⊆Pred(T). That is, each predicate occurring in the ABoxA must occur also in the TBoxT. Then, in ourDatalogprogram, for each atomic conceptAand for each atomic roleRoccurring inT, we add the following rules:

>(X)←A(X); >(X)←R(X, Y); >(Y)←R(X, Y).

This is only one of the several ways in which we can encode such a predicate.

In fact, another possibility would be—as suggested by Rosati in [5]—to assume that each ABoxAcontains an assertion>(a) for each individuala∈Ind(A). We believe that in the context of OBDA—where the focus is to provide access to arbitrary data-sources—it is important to make as few assumptions as possible on the physical realization of the ABox. For this reason, we prefer the option presented above.

Next, we formalize the transformation of a TBoxT into aDatalogprogram PT. LetAux={oA|A∈NC} ∪ {o>}be a set ofauxiliary individuals distinct from NI. Then, the program PT is constructed from terms in NT ∪Aux and predicates inNC∪NR∪ {>} as follows. The transformation uses the function Θ, shown in Figure 1, to transform each axiom in the (normalized) TBoxT into a set of Datalogrules. TheDatalogprogramP_T is then defined as follows.

P_T =S

α∈T Θ(α)

S

A∈Pred(T)∩NC >(X)←A(X) S

R∈Pred(T)∩NR >(X)←R(X, Y), >(Y)←R(X, Y) The following result readily follows from the definition of the program.

Proposition 1. For an arbitrary EL TBox T, Datalog program P_T can be computed in time linear in the size ofT.

(6)

Consider an arbitrary EL ABox A. Next, we prove that the unraveling U w.r.t.AofM(P_T∪A) can be used to answer conjunctive queries overK=hT,Ai.

We do so in two distinct steps. First, we introduce the notion of chase of anEL knowledge baseK. Second, we show that the chase ofKis isomorphic toU.

The chase of an EL knowledge base K = hT,Ai, written chase(K), is a possibly infinite Herbrand interpretation defined inductively by starting fromA and then applying axioms occurring in the TBox to assertions occurring in the ABox. In our definition of the chase, we use function terms to denote existentially quantified individuals. Hence, the definition of ABox assertion is extended in a natural way to accommodate for assertions over function terms. We denote with uandwterms that can be either individuals or function terms. Next, we define an operator Γ_T that chases an ABox by applying the axioms occurring in the TBoxT. In the definition, we use assertions of the form>(u) to assert thatuis a member of the ELconcept expression >. For S an arbitrary ABox, ΓT(S) is the smallest ABox containingS and closed under the followingchasing rules.

(cr1) If{A(u)} ⊆ S andAvB∈ T, then{B(u)} ⊆Γ_T(S).

(cr2) If{A1(u), A₂(u)} ⊆ S andA₁uA₂vB ∈ T, then{B(u)} ⊆Γ_T(S).

(cr3) If{R(u, w), A(w)} ⊆ S and∃R.AvB∈ T, then{B(u)} ⊆Γ_T(S).

(cr4) If{A(u)} ⊆ S andAv ∃R.B∈ T, then

{R(u, f(u, R, B)), B(f(u, R, B))} ⊆Γ_T(S).

(cr5) Ifuoccurs inS, then {>(u)} ⊆Γ_T(S).

We now define an infinite sequence of finite ABoxesAi fori∈N. A0 =A

Ai+1=Γ_T(Ai)

Finally, the chase ofK is the infinite union of all such ABoxesAi. chase(K) =[

i∈N

Ai

It is clear that our construction of the chase ofKisfair. In fact, for eachi∈Nwe have thatA_i+1 is the result of exhaustively applying—to all possible assertions occurring in A_i—all applicable axioms inT. At last, we want to point out that chase(K) can be used to compute the certain answers to a CQqoverK.

Proposition 2 ([5]). LetK be anELknowledge base. Further, letq be ak-ary conjunctive query. Then, for eachk-ary tuple of individualsha1, . . . , a_ki, we have

ha1, . . . , aki ∈cert(q,K)if and only if chase(K)|=q[a1, . . . , ak].

So, by proving that the unravelingU ofM(P_T∪A) is isomorphic tochase(K), we establish thatU can be used to answer conjunctive queries overK. To prove the structural equivalence of U and chase(K), we define a function hmapping

(7)

paths occurring in U to terms in chase(K). We define h by induction on the depth of paths occurring inU as follows.

Base Case.Consider an arbitraryp∈∆^Uwithdep(p) = 1. We seth(p) :=p.

Inductive Step. Let p=t1·R2·t2· · ·t_n−1·Rn·tn be a path occurring in U such thath(p) has not been defined yet, buth(t1· · ·R_n−1·t_n−1) =u. We distinguish between two cases depending on the type of the individualtn.

1. Iftn occurs in the ABox, we seth(p) :=tn. 2. Iftn is of the formoB, we seth(p) :=f(u, Rn, B).

Theorem 1 shows thathis an isomorphism between the two structures. In- tuitively, for the only-if direction, we show thathis an injective homomorphism from U to chase(K) by induction on the depth of paths occurring inU; for the if-direction, by induction on the construction of chase(K) we prove thath is a surjective function and that it is a homomorphism fromchase(K) toU.

Theorem 1. Function his an isomorphism from U tochase(K).

Since the unraveling ofM(P_T ∪ A) is generally infinite, this result alone does not provide us with an algorithm for answering queries inEL. In the next section, we show how to rewrite a user query q into a Datalog queryhQP, QCi such that PT ∪ A ∪QC |=QP(a1, . . . , ak) if and only if ha1, . . . , aki ∈cert(q,K) and thus solve the problem.

4 Polynomial Query Rewriting in Datalog

In the previous section, we have seen that for an arbitraryELKB K=hT,Ai evaluating a conjunctive query q over the unraveling of the Herbrand model of P_T ∪ Ais equivalent to computing the certain answers to q over K. In this section, we develop a query rewriting procedure that reduces the computation of cert(q,K) to the problem of evaluating a suitably constructed Datalogquery over P_T ∪ A. We achieve this in two steps. First, we present an interesting property of a certain class of interpretations. Second, we show how this result can be used to develop a query rewriting procedure in ourDatalogsetting.

We use the notions ofA-connected and split interpretations from [6, 13]. Let I be an interpretation and let A be an ABox over an arbitraryEL signature.

We say thatI isA-connectedif, for each domain elementc∈∆Î, there exists a pathp∈paths_A(I) such thattail(p) =c. Furthermore,Iis asplit interpretation if, for all domain elements c, c⁰ ∈ ∆Î, we have that c 6∈ {aÎ | a ∈ Ind(A)}

and hc, c⁰i ∈ RÎ imply c⁰ 6∈ {aÎ | a ∈ Ind(A)}. Intuitively, in an A-connected interpretationI, for each domain elementc_n it is always possible to find a path aÎ·R₂·c₂· · ·R_n·c_n such thata∈Ind(A). Furthermore, if I is split, then each domain element that is not the image of an individual can be related by a role only with elements that themselves do not interpret individuals.

Then, letJ be the unraveling w.r.t. Aof a split andA-connected interpre- tationI and letqbe a conjunctive query. Lutz et al. in [6, 13] showed that it is possible to reduce the problem of answering q in J to evaluating a first-order query rewriting q^∗ ofq over I. Roughly speaking, the query rewritingq^∗ rules

(8)

out some spurious answers for q in I that cannot be reproduced in J. More specifically, we have to ensure that the answer variables of q, the variables of q mapped to cyclic portions of I, and the variables of q mapped to nontree portions ofI are all matched only to the domain elements in{a^I |a∈Ind(A)}.

We now briefly outline how we can construct such an FO rewriting q^∗ for q[6]. Let∼q be the smallest equivalence relation overNT(q) that is closed under the following rule: if R(s, t) andR(s⁰, t⁰) occur in q and t ∼q t⁰, then s ∼q s⁰. Then, for each equivalence classζof∼q, we lettζ ∈ζbe an arbitrary, but fixed, representative of the class. Also, for each such equivalence classζ and for each atomic roleR occurring inq, we letPred(ζ, R) be the following set.

Pred(ζ, R) ={t∈N_T(q)|R(t, t⁰) occurs in qwitht⁰ ∈ζ}

Next, we define three sets of terms that correspond to the above mentioned cases.

– Fork=is the set of all pairshPred(ζ, R), tζisuch thatζ is an equivalence class of∼q and|Pred(ζ, R)|>1.

– Fork₆₌is the set of all quantified variablesv∈qvar(q) for which atomsR(s, v) andS(s⁰, t) exist inqsuch thatR6=S and v∼q t.

– Cycis the set of all variablesv∈qvar(q) for which atoms R0(t0, t⁰₀), . . . , Rm(tm, t⁰_m), . . . , Rn(tn, t⁰_n)

exist in q such thatm, n≥0; for somei ≤n we have thatt_i ∼_q v; for each j < nwe have thatt⁰_j ∼q tj+1; andt⁰_n ∼q tm.

We are now ready to formally specify the FO query rewritingq^∗. In the definition, we assume thatAuxis a fresh predicate not occurring inqandKand that every interpretationIinterpretsAuxas∆^I\ {a^I |a∈Ind(A)}. Then, formulaeq1and q2 are defined as follows.

q₁= ^

v∈avar(q)∪Fork6=∪Cyc

¬Aux(v)

q₂= ^

hPred(ζ,R),t_ζi∈Fork=

¬Aux(t_ζ)∨ ^

t,t⁰∈Pred(ζ,R)

(t=t⁰)

Finally, we setq^∗=q∧q1∧q2. It turns out thatq^∗can be computed in polynomial time w.r.t.q[6]. In the same paper, Lutz et al. prove the following result.

Proposition 3. LetAbe an arbitraryELABox, letIbe a split andA-connected interpretation, and letJ be the unraveling ofI w.r.t.A. Then, for everyk-tuple of individuals ha1, . . . , aki, we have that

I |=q^∗[a1. . . , ak]if and only if J |=q[a1. . . , ak].

This result applies to ourDatalogrewriting ofELTBoxes. Indeed, for an arbitraryELKBK=hT,Ai, we have thatM(P_T∪A) is a split andA-connected interpretation. The intuition behind the argument is as follows. We show that M(P_T ∪ A) is split by noticing that rules encoded in P_T do not allow for the derivation of facts of the form R(o_B, a) for a∈Ind(A) ando_B ∈Aux. To see thatM(P_T ∪ A) isA-connected, we just recall thatM(P_T ∪ A) is minimal and, hence, all the derived facts must be “grounded” w.r.t. the facts inA.

(9)

Theorem 2. Let K=hT,Aibe anELknowledge base. Then,M(P_T ∪ A)is a split and A-connected interpretation.

Hence, for an arbitraryk-ary CQqand for eachk-tuple of individualsha₁, . . . , a_ki, we have thatM(P_T ∪ A)|=q^∗[a₁. . . , a_k] if and only ifha₁, . . . , a_ki ∈cert(q,K).

Note that q^∗ is a first-order query, and we are unaware of systems capa- ble of evaluating first-order queries over Datalog programs. Therefore, we next show how to transform q^∗ into a Datalog query hQP, QCi such that ha1, . . . , aki ∈cert(q,K) if and only if P_T ∪ A ∪QC |=QP(a1. . . , ak). We construct such aDatalog queryhQP, QCi by applying toq^∗ a simplified version of the Lloyd-Topor transformation [14, 15].

Definition 1 (Datalog Rewriting). Let q(~x) be a k-ary CQ whose quantified variables are among~y; let Cyc,Fork₆₌, and Fork= be as specified above; let hPred(ζ¹, R¹), t¹_ζi, . . ., hPred(ζⁿ, Rⁿ), tⁿ_ζi be an arbitrary enumeration of Fork=; let p0, p1, . . . , pn be fresh predicates; and let Named be a built-in with a prede- termined, possibly infinite Herbrand interpretation N ={Named(a) | a∈ NI}.

QueryQ_C then contains the following safe Datalogrules:

p0(~x, ~y)←q, ^

v∈avar(q)∪Fork6=∪Cyc

Named(v) (1)

pi(~x, ~y)←p_i−1(~x, ~y),Named(tⁱ_ζ) for1≤i≤n (2)

p_i(~x, ~y)←p_i−1(~x, ~y), ^

t,t⁰∈Pred(ζⁱ,Rⁱ)

t=t⁰ for1≤i≤n (3)

Q_P(~x)←p_n(~x, ~y) (4)

One may think that the recursive definition of predicatespi for 1≤i≤ncould be simplified by writingQP(~x)←p0(~x, ~y). . . pn(~x, ~y) and by defining eachpias:

pi(~x, ~y)←Named(tⁱ_ζ) pi(~x, ~y)←V

t,t⁰∈Pred(ζⁱ,Rⁱ)t=t⁰

Unfortunately, these rules are not safe. Safe rules, on the one hand, provide us with a clear and unambiguous semantics. On the other hand, unsafe rules are also computationally more expensive for bottom-up computation, since each variable in the head may be bound to an arbitrary individual in the universe of the program. For this reason, we prefer our, slightly more involved, solution.

Proposition 4. For an arbitrary k-ary conjunctive query q, query hQP, Q_Ci can be computed in polynomial time w.r.t. the size of q.

Proof. We note that ∼q can be computed in polynomial time w.r.t. the size of q[6] and, therefore, also the setsCyc,Fork₆₌, andFork₌can be computed in polynomial time w.r.t.q. Furthermore, the size of the body of rulep₀(~x, ~y) depends linearly on the size of q, Cyc, and Fork₆₌. Also, for each pair hPred(ζ, R), t_ζiin Fork₌, the programQ_C contains exactly two rules. The size of these two rules depends linearly on the size ofhPred(ζ, R), tζi. Thus, we conclude thathQP, QCi can be computed in polynomial time with respect to the size ofq. ut

(10)

In [10], we prove that our rewriting is correct—that is, that answeringhQ_P, Q_Ci overP_T ∪ Ais equivalent to computing cert(q,T,A). This follows from Propo- sition 3 and the fact that ourDatalogquery is the result of transforming the FO rewritingq^∗ along the lines of the Lloyd-Topor transformation.

Theorem 3. Let K be an ELknowledge base and letq be a k-ary CQ over K.

Then, for everyk-tuple of individuals ha1, . . . , a_ki, we have that

ha1, . . . , aki ∈cert(q,K)if and only if P_T ∪ A ∪QC|=QP(a1, . . . , ak).

Finally, we investigate the complexity of our rewriting procedure.

Theorem 4. Let K = hT,Ai be an EL KB, let q be a k-ary CQ, and let ha1, . . . , akibe a tuple of individuals. We can decidePT∪A∪QC|=QP(a1, . . . , ak) in polynomial time w.r.t. the size ofKand in non-deterministic polynomial time with respect to the size of bothK andq.

Proof. We have already argued that the size of DatalogprogramPT depends linearly on the size of the TBox T and that the Datalogrewriting hQP, QCi can be computed in Ptime w.r.t.q. Also, we note that the arity of predicates and the number of variables occurring in P_T ∪ A ∪Q_C do not depend on K.

Finally, from an implementation point-of-view (as suggested in [12]), the built-in predicateNamedcan be considered as an assertion in the ABoxAwith a different physical realization: it is not directly stored in the ABox but it is implemented as a procedure which is evaluated during the execution of the program. Clearly, such a procedure can be implemented to run in time polynomial inK. It follows that we can compute the minimal Herbrand model of P_T ∪ A ∪QC in time polynomial in the size of K [8]. The membership in NP follows directly from the considerations above and from the fact that we can guess and check in nondeterministic polynomial time a matchπforQP inM(P_T ∪ A ∪QC). ut

5 Conclusions

In this paper, we introduce a query rewriting approach to query answering inEL.

In our approach, the process of computing the certain answers to a CQqover an ELKBK=hT,Aiis divided into two distinct steps. A first preprocessing step in which the TBox T is transformed into a Datalog programP_T, whose size is linear inT. Then, at query time, the queryq is independently rewritten into a DatalogqueryhQ_P, Q_Ci, whose size is polynomial inq. Finally, computing cert(q,K) amounts to evaluating theDatalogqueryhQ_P, Q_CioverP_T ∪ A.

In future, we plan to extend our approach to deal withELH_⊥^dr. Lutz et al.

have already proposed a combined approach for query answering in this DL [13].

However, differently from their solution, we would like the Datalogrewriting hQ_P, Q_Ci to be independent from the role inclusions contained in the TBox.

Additionally, we plan to extend our work to cover nominals, which raises the question on how to efficiently handle equality inDatalog[2].

(11)

References

1. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rodr´ıguez- Muro, M., Rosati, R.: Ontologies and databases: the DL-Lite approach. In Tessaris, S., Franconi, E., eds.: Semantic Technologies for Informations Systems – 5th Int.

Reasoning Web Summer School (RW 2009). Volume 5689. (2009) 255–356 2. P´erez-Urbina, H., Motik, B., Horrocks, I.: Tractable query answering and rewriting

under description logic constraints. J. Applied Logic8(2) (2010) 186–209 3. Baader, F., Brandt, S., Lutz, C.: Pushing the ELenvelope. In Kaelbling, L.P.,

Saffiotti, A., eds.: IJCAI, Professional Book Center (2005) 364–369

4. Baader, F., Brandt, S., Lutz, C.: Pushing theELenvelope further. In Clark, K., Patel-Schneider, P.F., eds.: In Proceedings of the OWLED 2008 DC Workshop on OWL: Experiences and Directions. (2008)

5. Rosati, R.: On conjunctive query answering inEL. In Calvanese, D., Franconi, E., Haarslev, V., Lembo, D., Motik, B., Turhan, A.Y., Tessaris, S., eds.: Description Logics. Volume 250 of CEUR Workshop Proceedings., CEUR-WS.org (2007) 6. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in EL using a

database system. In: Proceedings of the OWLED 2008 Workshop on OWL: Expe- riences and Directions. (2008)

7. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to ontology-based data access. In: IJCAI, AAAI Press (2011) 8. Abiteboul, S., Hull, R., Vianu, V.: Foundations of databases. Addison-Wesley

(1995)

9. Chen, W., Warren, D.S.: Query evaluation under the well-founded semantics.

In: Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. PODS ’93, New York, NY, USA, ACM (1993) 168–179

10. Stefanoni, G., Motik, B., Horrocks, I.: Small datalog query rewriting for EL. Technical report, University of Oxford (2012) Available at http://www.cs.ox.ac.uk/people/giorgio.stefanoni/pubs/2012/tr/dl2012.pdf.

11. Ullman, J.D.: Principles of database and knowledge-base systems, Vol. I. Computer Science Press, Inc., New York, NY, USA (1988)

12. Ceri, S., Gottlob, G., Tanca, L.: What you always wanted to know about datalog (and never dared to ask). IEEE Trans. Knowl. Data Eng.1(1) (1989) 146–166 13. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description

logicELusing a relational database system. In Boutilier, C., ed.: IJCAI. (2009) 2070–2075

14. Lloyd, J., Topor, R.: Making prolog more expressive. The Journal of Logic Pro- gramming1(3) (1984) 225 – 240

15. Decker, S.: Semantic web methods for knowledge managmement. PhD thesis, University of Karlsruhe (2002)