Query Rewriting in DL-Lite_^(HN)_horn

(1)

Query Rewriting in DL-Lite

⁽^HN⁾

horn

⋆

Elena Botoeva, Alessandro Artale, and Diego Calvanese KRDB Research Centre

Free University of Bozen-Bolzano I-39100 Bolzano, Italy [email protected]

Abstract. In this paper we present practical algorithms for query answering and knowledge base satisfiability checking inDL-Lite^(HN)horn , a logic from the extended DL-Litefamily that contains horn concept inclusions and number restriction. This logic is the most expressive DL that is shown to be FOL-rewritable. The algorithms we present are based on the rewriting technique so that reasoning over the TBox and over the ABox can be done independently of each other, and the inference problems are reduced to first order query evaluation. This allows for employing relational database technology for the final query evaluation and gives optimal data complexity.

1 Introduction

Query answering is the main reasoning task in the setting of ontology based data access [1,2] and data integration [3], where large amounts of data are stored in external databases, and accessed through a conceptual layers provided by an ontology (expressed in terms of a description logic knowledge base). Query answering in this case requires reasoning, and to perform it efficiently in practice the underlying description logic has to be ‘lite’ enough, to be more precise it should enjoy first-order rewritability: it should be possible to rewrite a queryqposed over the ontology in terms of a new query that can be directly evaluated over the data, and that provides the same answers as those provided byq.

TheDL-Litefamily [4] is a family of description logics that enjoy such nice computational properties, i.e., data complexity of query answering is in AC⁰. ForDL-Litecore

andDL-Lite^Fcore1, the basic logics of theDL-Litefamily, a polynomial query answering algorithm was established in [4], and later extended toDL-LiteA[1]. The idea of the algorithm is to first rewrite the query by taking into account the assertions in the TBox and then to evaluate the rewritten query over the ABox. Based on this approach, several systems were implemented, notably QUONTO[6,7].

DL-Lite^(HN)_horn is a more expressive logic than DL-LiteA, which contains number restrictions (as opposed to global functionality assertions), allows for conjunction of basic concepts on the left-hand side of concept inclusions, and for role inclusions that

⋆This work has been partially supported by the EU project Ontorule (ICT-231875).

1Notice that we adopt here the naming convention for logics introduced in [5].

267

(2)

however cannot interact with maximum number restriction. As shown in [5], alsoDL- Lite^(HN_horn⁾is FOL-rewritable, i.e., a similar approach to the one discussed above can be used for query answering over ontologies. However, the algorithm presented in [5] is not immediately implementable, since it would generate queries that would be extremely difficult to process and optimize by a DBMS. Indeed as demonstrated by recent ex- periments with ontology based data access systems [2], commercial relational DBMSs are not designed and optimized to process complex queries (where, e.g., joins are performed over unions), and for such kinds of queries performance degrades dramatically when the size of the data increases.

Therefore, in this paper we address the problem of devising an algorithm for answering unions of conjunctive queries inDL-Lite^(HN_horn⁾that is based on rewriting, and where the rewriting step generates again a union of conjunctive queries. Such an algorithm can be directly implemented in a system like QUONTOby extending the current algorithm forDL-LiteA. Moreover, since current DBMSs are optimized for the evaluation of conjunctive queries, they can process the queries generated by our rewriting algorithm more efficiently than queries generated by an algorithm that tries to delegate complex operations to the DBMS.

Summing up, our contributions are the following:

– We present an algorithm for answering unions of conjunctive queries posed toDL- Lite^(HN_horn⁾knowledge bases that is in AC⁰for data complexity. We employ the query rewriting approach, that is, query answering is performed in two steps. First, the initial query is rewritten using the TBox. Then, the rewritten query is evaluated over the ABox. The main advantage of this approach is that the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine.

– We provide an algorithm that checks satisfiability ofDL-Lite^(HN_horn⁾knowledge bases by evaluating a first-order query over the ABox and that is in AC⁰for data complexity. Thus, the knowledge base satisfiability problem is reduced to query evaluation and again the TBox and the ABox are processed independently of each other.

2 The Description Logic DL-Lite

⁽_horn^HN⁾

In this section we present the logicDL-Lite^(HN_horn⁾and give other preliminary definitions.

The language ofDL-Lite^(HN_horn⁾contains atomic concept and role names, respectively denoted byAandP, possibly with subscripts.Basic roles andconcepts, denoted respectively by R andB, possibly with subscripts, are defined asR ::= P |P⁻ and B::=⊥ |A| ≥k R, wherekis a positive integer.≥k Ris called anumber restriction.

ADL-Lite^(HN_horn⁾TBox,T, is a finite set ofconceptandrole inclusion axiomsof the formB1⊓ · · · ⊓Bn⊑BandR1⊑R2, respectively, androle constraintsDis(R1, R2), Asym(P),Sym(P),Irr(P)andRef(P). The TBoxT may also contain occurrences of qualified number restrictions≥k R.Bon the right-hand side of concept inclusions. The TBox assertions must satisfy the following conditions:

(3)

(inter) ifRhas a proper sub-role inT (i.e.,R^′ ⊑ R,R 6⊑R^′for someR^′), thenT does not contain occurrences of number restrictions≥k Ror≥k R⁻ withk≥2 on the left-hand side of concept inclusions.

(exists) if ≥k R.B occurs in T, then T does not contain occurrences of≥k^′R or

≥k^′R⁻, fork^′ ≥2, on the left-hand side of concept inclusions.

Note that disjointness between conceptsB1 andB2 is expressed asB1⊓B2 ⊑ ⊥.

Also,DL-Lite^(HN_horn⁾allows for expressing local cardinality constraints, e.g., the assertion A⊓ ≥3R⊑ ⊥is equivalent toA⊑ ≤2R.

An ABox A is a finite set of membership assertions of the form A(a),¬A(a), P(a, b), and¬P(a, b).T andAconstitute theknowledge baseK=hT,Ai.

In the following, we useR⁻ to denoteP⁻ ifR =P andP ifR = P⁻,R(x, y) to denoteP(x, y)ifR = P, andP(y, x)ifR =P⁻. For a TBoxT, let⊑^± denote the closure under inverses of the subrole relation:R ⊑^± R^′ ∈ T iffR ⊑R^′ ∈ T or R⁻⊑R^′−∈ T.

The formal semantics relies on the standard notion of interpretation [8]. Here we adopt theunique name assumption(UNA). In the following we assume that TBoxes do not contain role constraintsAsym(P),Sym(P),Irr(P),Ref(P)and qualified number restrictions: we can get rid of them as described in [5].

In this work we concentrate on two reasoning tasks forDL-Lite^(HN)_horn , to which other reasoning tasks can be reduced: knowledge base satisfiability and query answering. The KB satisfiability problem is to check, given a KBK, whether K admits at least one model. To define the query answering problem, we first provide some definitions.

A conjunctive query (CQ) q over a KB K is a first-order formula of the form:

q(x) =∃y.conj(x,y), whereconj(x,y)is a conjunction of atoms of the formA(t) and P(t1, t2), and t, t1, t2 are either constants in K or variables in x andy, x are the free variables of q, also called distinguished variables. A union of conjunctive queries (UCQ)q is a formula of the formq(x) = W

i=1,...,n∃yi.conji(x,yi), with conji(x,yi)as before. Given a queryq(either a CQ or a UCQ) and an interpretation I, we denote byq^Ithe set of tuples of elements of∆^I obtained by evaluatingqinI.

The answer toqover a KBKis the setans(q,K)of tuplesa of constants appearing inKsuch thata^I ∈q^I, for every modelIofK. Each such tuple is calledcertain answer. Observe that, ifKis unsatisfiable, thenans(q,K)is trivially the set of all possible tuples of constants inKwhose arity is the one of the query. We denote such a set by AllTup(q,K). Thequery answering problemis defined as follows: given a KBKand a queryq(either a CQ or a UCQ) overK, compute the setans(q,K).

3 Knowledge Base Satisfiability

To check KB satisfiability, we exploit the notions of canonical interpretation and closure of negative inclusions. First, we define negative and positive inclusion assertions, and the database interpretation.

We callnegative inclusions (NI)assertions of the formB1⊓ · · · ⊓Bn ⊑ ⊥and Dis(R1, R2). Other assertions are calledpositive inclusions (PI).TN denotes the set of all negative inclusions inT, andTP the set of positive inclusions inT. Obviously,

(4)

T =TN∪ TP. Note that NIs include also those assertions that express functionality of roles, e.g.≥2R⊑ ⊥, and more in general maximum number restrictions.

Thedatabase interpretationdb(A) = h∆^db(A),·^db(A)iof an ABoxA is defined as follows: ∆^db^(A) is the nonempty set consisting of all constants occurring in A;

a^db^(A) = a, for each constanta;A^db^(A) = {a| A(a) ∈ A}, for each atomic con- ceptA;P^db^(A)={(a1, a2)|P(a1, a2)∈ A}, for each atomic roleP.

The canonical interpretation is an interpretation constructed according to the notion of chase [9]. Following [4] we can construct the chase of a KB starting from the ABox, and applying positive inclusions to sets of membership assertions. In the definition of chase, we concentrate here on the differences with respect to the definition of chase given in [4]. We remark, however, that for the application of the chase rules we assume to have a total (lexicographic) ordering on the assertions and on all the constants (in- cluding the newly introduced ones). ThechaseofKis the set of membership assertions chase(K) =S

j∈NSj, whereS0 =A,Sj+1=Sj∪S_j^new, andS_j^new is the set of new membership assertions obtained fromSj according to the chase rules cr1, cr2, cr3:

cr1 ifI =B1⊓ · · · ⊓Bn ⊑A,S^′ =SB₁(a)∪ · · · ∪SBn(a), andA(a)∈ S/ j, then S_j^new={A(a)},

cr2 ifI=B1⊓· · ·⊓Bn ⊑ ≥k RandS^′=SB₁(a)∪· · ·∪SBn(a),k1=♯{b|R(a, b)∈ S}< k, thenS_j^new ={R(a, bk₁+1), . . . , R(a, bk)}, wherebk₁+1, . . . , bkarek−k1

new constants inΓN that follow lexicographically the constants introduced in the previous steps,

cr3 ifI=R1⊑^±R2,S^′={R1(a, b)}, andR1(a, b)∈ S/ jthenS_j^new={R2(a, b)}, whereS^′is the first (in lexicographic order) set of membership assertions inSjs.t. there exists a PI applicable²to it,Iis the first such PI, and we useSB(a)to denote{A(a)}

ifB =Aand{R(a, b1), . . . , R(a, bk)}ifB=≥k R, withb1, . . . , bksome constants.

We denote bychasei(K)the portion of the chase obtained afteriapplications of the chase rules.

The canonical interpretation is the interpretation can(K) = h∆^can^(K),·^can^(K)i where∆^can^(K)is the set of constants occurring in chase(K),a^can^(K) = a, for each constant a, A^can^(K) = {a | A(a) ∈ chase(K)}, for each atomic concept A, and P^can^(K)={(a1, a2)|P(a1, a2)∈chase(K)}, for each atomic roleP. The canonical interpretation is constructed in such a way that all the positive inclusions of the TBox are satisfied, so a knowledge base where the TBox contains only positive inclusions is always satisfiable. The following lemma establishes a notable property ofcan(K).

Lemma 1. LetK = hT,Aibe a DL-Lite^(HN_horn⁾KB, and letTP be the set of positive inclusion assertions inT. Thencan(K)is a model ofhTP,Ai.

In order to check satisfiability ofDL-Lite^(HN_horn⁾KBs, negative inclusions must be considered. Thus, if a negative inclusion in the TBox is violated by membership assertions of the ABox, then the knowledge base is inconsistent and, therefore, unsatisfiable.

Besides, an interaction of positive and negative inclusions may cause inconsistency. So we need to consider all NIs implied by the TBox.

2The notion ofapplicabilityof a PI suitably extends the one in [4].

(5)

Definition 2. LetT be a TBox. We callNI-closure ofT, denoted bycln(T), the following set of assertions defined inductively:

1. TN ⊆ cln(T).

2. ifB1⊓ · · · ⊓Bn⊓A⊑ ⊥ ∈cln(T),n≥0andB₁^′ ⊓ · · · ⊓B^′_m⊑Ais inT, then alsoB1⊓ · · · ⊓Bn⊓B₁^′ ⊓ · · · ⊓B_m^′ ⊑ ⊥ ∈cln(T).

3. ifB1⊓ · · · ⊓Bn⊓ ≥k R⊑ ⊥ ∈cln(T),n≥0andB^′₁⊓ · · · ⊓B_m^′ ⊑ ≥k^′Ris in T fork^′≥k, then alsoB1⊓ · · · ⊓Bn⊓B^′₁⊓ · · · ⊓B_m^′ ⊑ ⊥ ∈cln(T).

4. ifB1⊓ · · · ⊓Bn⊓ ≥1R⊑ ⊥ ∈cln(T),n≥0andR^′ ⊑^± Ris inT, then also B1⊓ · · · ⊓Bn⊓ ≥1R^′⊑ ⊥ ∈cln(T).

5. if Dis(R, R1) or Dis(R1, R) ∈ cln(T) and R^′ ⊑^± R is in T, then also Dis(R^′, R1)∈cln(T).

6. if either≥1R ⊑ ⊥, or ≥1R⁻ ⊑ ⊥, orDis(R, R)is incln(T), then all three assertions are incln(T).

Note, that in rule 4 it is enough to considerk = 1 because the condition (inter) ensures that concepts of the form≥k R, fork ≥2, do not occur in the left-hand side of concept inclusions whenR appears in the right-hand side of a role inclusion. Also we don’t need to add both inclusionsDis(R, R1)andDis(R1, R)tocln(T), since to trigger rule 5 one of them is sufficient.

The canonical interpretation can also be exploited for checking satisfiability of a KB containing negative inclusions. To establish that they are satisfied bycan(K), it suffices to verify that the interpretationdb(A)satisfiescln(T).

Lemma 3. LetK=hT,Aibe a DL-Lite^(HN_horn⁾KB. Thencan(K)is a model ofKif and only ifdb(A)is a model ofhcln(T),Ai.

The proof follows the line of that of Lemma 12 in [4], but we need to take into account the modified definition of chase, and hence of can(K). Now we can show that to check satisfiability of a KB it is is sufficient (and necessary) to look atdb(A) (provided we have computedcln(T)). More precisely, the next theorem shows that a contradiction on aDL-Lite^(HN_horn⁾KB may hold only if a membership assertion in the ABox contradicts a negative inclusion in the closurecln(T).

Theorem 4. LetK=hT,Aibe a DL-Lite^(HN)_horn KB. ThenKis satisfiable if and only if db(A)is a model ofhcln(T),Ai.

Having these results, we can formulate the satisfiability problem in terms of evaluation of a first order query over the database (interpretation). In order to do so we define a translationδfrom assertions incln(T)to FOL formulas encoding their violation:

δ(B1⊓ · · · ⊓Bn⊑ ⊥) =∃x(γB₁(x)∧ · · · ∧γBn(x)) δ(Dis(R1, R2)) =∃x, y(ρR₁(x, y)∧ρR₂(x, y)) where:

γBi(x) =A(x), ifBi=A;

γBi(x) =∃y1, . . . , yk(P(x, y1)∧ · · · ∧P(x, yk)∧V

j<lyj 6=yl), ifBi=≥k P; γBi(x) =∃y1, . . . , yk(P(y1, x)∧ · · · ∧P(yk, x)∧V

j<lyj 6=yl), ifBi=≥k P⁻; ρRi(x, y) =P(x, y), ifRi=P; ρRi(x, y) =P(y, x), ifRi =P⁻.

(6)

AlgorithmConsistent(K)

Input:DL-Lite^(HN)horn KBK=hT,Ai

Output:trueifKis satisfiable,falseotherwise q^unsat =⊥;

for eachI∈cln(T)doq^unsat =q^unsat∨δ(I);

ifqunsat

db(A)=∅then returntrue;else returnfalse;

Fig. 1.The algorithmConsistent

The algorithmConsistent, depicted in Fig. 1, takes as input aDL-Lite^(HN_horn⁾KB, computes db(A)andcln(T), and evaluates overdb(A)the Boolean FOL query obtained by taking the union of all FOL formulas returned by the application of the above functionδto every assertion incln(T). In the algorithm, the symbol⊥indicates a predicate whose evaluation is false in every interpretation. Therefore, whenKcontains no negative inclusions,qunsat

db(A)=⊥^db^(A), andConsistent(K)returns true.

One can show that the algorithmConsistentterminates and a KBKis satisfiable if and only ifConsistent(K) = true. As a direct consequence, we get the following theorem, which provides an alternative proof of an analogous result shown in [5].

Theorem 5. In DL-Lite^(HN)_horn , knowledge base satisfiability is FOL-rewritable.

Now we can characterize the computational complexity of the algorithmConsistent.

Theorem 6. The algorithmConsistentisAC⁰in the size of the ABox and runs in exponential time in the size of the TBox.

The following example demonstrates the worst case behaviour of the algorithm, where the size ofcln(T)is exponential in the size ofT.

Example 7. Let us consider the TBoxT consisting of the assertions:A^′₁ ⊑ A1, . . . , A^′_n ⊑An,A1⊓ · · · ⊓An⊑ ⊥. Thencln(T)contains2ⁿnegative inclusion assertions of the formB1⊓· · ·⊓Bn ⊑ ⊥, where eachBiis eitherAiorA^′_i. Moreover, none of the assertions can be omitted: for every negative inclusionI =B1⊓ · · · ⊓Bn ⊑ ⊥there is an ABoxAsuch thatδ(I)^db^(A)=∅. It is enough to takeA={B1(a), . . . , Bn(a)}

witha a fresh object name. Therefore, in order to ensure that the algorithm detects unsatisfiability ofhT,Ai, all the possible negative inclusions must be present incln(T) and its size is exponential in the size ofT.

Note that the algorithmConsistentcan be turned into a non-deterministic PTIME

algorithm for checking unsatisfiability ofDL-Lite^(HN)_horn KBs.

4 Query Answering

The aim of this section is to devise an algorithm for answering unions of CQs inDL- Lite^(HN_horn⁾that is based on query reformulation, and hence can be easily implemented:

in the rewriting step the assertions of the TBox are compiled into the query, then the resulting query is evaluated over the ABox without considering the TBox.

(7)

LetQ=S

iqibe a UCQ. We say that an argument of an atom in a query isboundif it corresponds either to a constant, or to a distinguished variable, or to a shared variable, that is, a variable occurring at least twice in the query body. Instead, an argument of an atom in a query isunboundif it corresponds to a nondistinguished nonshared variable.

The symbol ‘ ’ is used to represent unbound variables.

LetQ^R_T denote the set of natural numbers containing 1 and all the numerical param- eterskfor which the concept≥k Roccurs inT. Theextensionext(T)ofT contains

– ≥k^′R ⊑ ≥k R, for allk, k^′ ∈ Q^R_T such thatk^′ > kandk^′ > k^′′ > k for no k^′′∈Q^R_T andRis either a direct or inverse role inT, and

– ≥k R⊑ ≥k R^′, for allk∈Q^R_T andR⊑^±R^′∈ T.

A positive concept inclusionI isapplicable to an atomB(x)ifI hasB on the right-hand side, whereB(x) =A(x)ifB =A,B(x) =EkP(x)ifB =≥k P, and B(x) =EkP⁻(x)ifB=≥k P⁻. A positive role inclusionIisapplicable to an atom P(x1, x2)ifIhas eitherPorP⁻on the right-hand side. We indicate withgr(g, I)the atom obtained from the atomgby applying the applicable inclusionI. Formally:

Definition 8. LetI∈ext(T)be a positive inclusion assertion that is applicable to the atomg. Then,gr(g, I)is the formula defined as follows:

1. ifg=A(x)andI=B1⊓ · · · ⊓Bn⊑A, thengr(g, I) =B1(x)∧ · · · ∧Bn(x), 2. if g = EkR(x) and I = B1⊓ · · · ⊓Bn ⊑ ≥k^′R, k^′ ≥ k, then gr(g, I) =

B1(x)∧ · · · ∧Bn(x),

3. ifg =E1R( )andI=B1⊓ · · · ⊓Bn ⊑ ≥k R, thengr(g, I) =B1(x)∧ · · · ∧ Bn(x), for a fresh variablex,

4. ifg=EkR(x)andI=P1⊑^±R, thengr(g, I) =EkP1(x), 5. ifg=EkR(x)andI=P1⊑^±R⁻, thengr(g, I) =EkP₁⁻(x), 6. ifg=P(x1, x2)andI=P1⊑^±P, thengr(g, I) =P1(x1, x2), 7. ifg=P(x1, x2)andI=P1⊑^±P⁻, thengr(g, I) =P1(x2, x1).

We also define themost general unifier,mgu, between two atomsg1, g2of a query qthat unify. In the case whereg1 andg2are respectively of the formA(x)andA(z), orP(x, y)andP(z, w), themguis defined as usual (see [4]), taking also into account the possible presence of inequalitiesx 6=z(andy 6= w). However, we allow also an atomP(x, y)to unify withE1P(z)andE1P⁻(w). Moreover, when one of the atoms is of the formEkR(x), then the mgu is defined as follows: let g1 = Ek₁R(x)and g2=Ek₂R(z),k=max(k1, k2), andx6=zdoes not occur inq; org1=EkR(x)and g2=E1R⁻( ), thenmgu=EkR(x).

In Fig. 2 we provide the algorithmPerfectRef, which reformulates a UCQ taking into account the PIs of a TBoxT. In the algorithm,q[g/g^′]denotes the CQ obtained from a CQqby replacing the atomgwith a new atomg^′. The functionremdupremoves from the body of a CQ duplicated atoms, or if a query contains two atoms of the form Ek₁R(x)andEk₂R(x), then it removes the atom with the smallerki. Furthermore,τis a function that takes as input a CQqand returns a new CQ obtained by replacing each occurrence of an unbound variable inqwith the symbol , wherebyP(x, )becomes E1P(x),P( , x)becomesE1P⁻(x)(andP( , )becomesE1P( )). Finally,reduce is a function that takes as input a CQqand two atomsg1 andg2that unify and occur

(8)

AlgorithmPerfectRef(Q,T)

Input:union of conjunctive queriesQ,DL-Lite^(HN)horn TBoxT Output:union of conjunctive queriesPR

PR:={remdup(τ(q))|qis a CQ inQ};

repeat (1)

PR^′:=PR;

for eachq∈PR^′ (a)for eachginq

for eachPIIinext(T) ifIis applicable tog

thenPR:=PR∪ {remdup(q[g/gr(g, I)])};

(b)for eachg1,g2inq ifg1andg2unify

thenPR:=PR∪ {remdup(τ(reduce(q, g1, g2)))};

untilPR^′=PR;

for eachq∈PR (2) for eachginq

ifgis of the formE^kR(x),k≥2,thenreplacegwithγ≥k R(x);

ifgis of the formE1R(x)thenreplacegwithR(x, );

returnPR.

Fig. 2.The algorithmPerfectRef

in the body ofq, and returns a CQq^′obtained by applying toqthe most general unifier betweeng1andg2.

Informally, part (1) of the algorithm reformulates the query by replacing and unify- ing atoms, and accumulates the new queries. In this part all possible applications of the PIs, according to Definition 8, are exhausted. Part (2) performs unfolding of the atoms EkR(x). Notice that it is sufficient to perform unfolding of the atomsEkR(x)in the very end for the following reasons: first, ifk ≥ 2, then only role inclusions could be applied to the atoms of the formR(x, y); however condition (inter) ensures that in this case there are no such inclusions. Ifk= 1, then such role inclusions have already been aplied in step (1). Second, one could reduce some of the atomsR(x, y), but it would not produce new answers, since all variables in the newR-atoms are bound.

This algorithm is an extension of thePerfectRefalgorithm devised for the logicDL- LiteA[4]. The extended version has to deal with inequality atoms and with atoms of the formEkR(x)that later need to be unfolded. In contrast with the algorithm forDL-LiteA

the query may grow, so one also needs to take care of redundant atoms. Notice also that the number of CQs inPRmay be exponential in the size of the TBox (and not only in the length ofq), due to the fact that horn inclusions may cause a single CQ to become of length linear in the size of the TBox. Here, we assume that numbers in the TBox are coded in unary.

Now, to compute the answers toQover the KBK=hT,Ai, we need to evaluate the set of conjunctive queriesPRproduced by the algorithmPerfectRefover the ABox Aconsidered as a relational database. The algorithm computingans(Q,K)is exactly the same as in [4], so we do not present it here. Correctness of the above described query-answering technique is established in the following. We start by observing that,

(9)

as in [4], query answering can in principle be done by evaluating the query over the modelcan(K).

Theorem 9. LetKbe a satisfiable DL-Lite^(HN_horn⁾KB, and letQbe a union of conjunctive queries overK. Then,ans(Q,K) =Q^can^(K).

Sincecan(K)is in general infinite, we cannot compute it and evaluateQover it.

Instead, we compile the TBox into the query, thus simulating the evaluation of the query overcan(K)by evaluating a finite reformulation of the query over the ABox considered as a database. The proof of the following lemma is inspired by the one forDL-LiteRin [4], but needs to take into account number restrictions and horn inclusion assertions.

Lemma 10. LetT be a DL-Lite^(HN_horn⁾TBox,Qa UCQ overT, andPRthe UCQ returned byPerfectRef(Q,T). For every DL-Lite^(HN)_horn ABoxAsuch thathT,Aiis satisfiable,ans(Q,hT,Ai) =PR^db^(A).

Proof. We first introduce the preliminary notion ofwitness of a tuple of constantswith respect to a CQ q inQ. Given a DL-Lite^(HN)_horn knowledge base K = hT,Ai, a CQ q(x)← conj(x,y)overK, and a tupletof constants occurring inK, a set of membership assertionsG is awitnessof tw.r.t.qif there exists a substitution σfrom the variablesyinconj(t,y)to constants inGsuch that the set of atoms inσ(conj(t,y)) is equal to G. Then t ∈ q^can^(K) iff there exists a witness G of t w.r.t. q such that G ⊆chase(K). The cardinality of a witnessG, denoted by|G|, is the number of membership assertions inG.

Let us prove first the statement for a modified version ofPerfectRef, where at the end of the algorithm we perform an additional step of unifications as in step (b) of part (1). Let us callPerfectRefuthis extended version ofPerfectRef.

We have that ans(Q,hT,Ai) = Q^can^(K) = S

q∈Qq^can^(K), and PR^db^(A) = S

q∈ˆ PRqˆ^db^(A), wherePRis the UCQ returned byPerfectRefu(Q,T). Hence, we need to show thatS

ˆ

q∈PRqˆ^db^(A) =Q^can^(K). For simplicity, we consider the case whereQ consists of a single CQq.

“⇐” We have to prove thatqˆ^db^(A) ⊆ q^can^(K), for eachqˆ∈ PR. Letqi+1be obtained fromqiby some step of the algorithmPerfectRefu. We can show thatq_i+1^can^(K)⊆ q^can_i ^(K)at any step. Since each query ofPRis eitherqor a query obtained fromqby repeatedly applying steps (a) and (b) of the algorithmPerfectRefu, then by the rewriting in part (2), and the unification step in the end, it follows that for eachqˆ ∈ PR, ˆ

q^can^(K) ⊆ q^can^(K), by repeatedly applying the property q^can_i+1^(K) ⊆ q_i^can^(K). Since db(A)⊆can(K)and CQs are monotonic queries, we getqˆ^db^(A)⊆qˆ^can^(K)⊆q^can^(K) for each CQqˆ∈PR.

“⇒” We have to show that for each tuplet ∈ q^can^(K), there existsqˆ∈ PRsuch thatt ∈qˆ^db^(A). First, sincet ∈q^can^(K), it follows that there exists a finite numberh such that there is a witnessGhoftw.r.t.qcontained inchaseh(K). Moreover, w.l.o.g.

we can assume that every rule cr1, cr2 and cr3 used in the construction ofchase(K)is necessary in order to generate such a witnessGh. In the following, we say that a setSof membership assertions is anancestorof a setS^′of membership assertions in a setSof

(10)

membership assertions, if there existS1, . . . , SninS, whereS1=SandSn =S^′, such that, for eachj∈ {2, . . . , n},Sj can be generated by applying a chase rule to a subset ofSj−1. We also say thatS^′is a successor ofS. Furthermore, for eachi∈ {0, . . . , h}, we denote withGithepre-witnessoftw.r.t.qinchaseh(K), defined as follows:

Gi= [

S^′⊆Gh

{S⊆chasei(K) | Sis an ancestor ofS^′inchaseh(K)and there exists no successor ofSinchasei(K) that is an ancestor ofS^′inchaseh(K)}.

Now we prove by induction onithat, starting fromGh, we can “go back” through the rule applications and find a queryqˆinPRsuch that the pre-witnessGh−ioftw.r.t.

qinchaseh−i(K)is also a witness oftw.r.t.q. To this aim, we prove that there existsˆ ˆ

q ∈PRsuch thatGh−iis a witness oftw.r.t.qˆand|ˆq|=|Gh−i|, where|ˆq|indicates the number of atoms in the CQqˆexcept inequality atoms. The claim then follows for i=h, sincechase0(K) =A.

Base step: There existsqˆ∈PRsuch thatGhis a witness oftw.r.t.qˆand|ˆq|=|Gh|.

This is an immediate consequence of the fact that (1)q∈PRand (2)PRis closed with respect to step (b) of the algorithmPerfectRefu.

Inductive step: Suppose there existsqˆ∈PRsuch thatGh−i+1is a witness oftw.r.t.

ˆ

qinchaseh−i+1(K)and|q|ˆ =|Gh−i+1|. Let us assume thatchaseh−i+1(K)is obtained by applying chase rule cr2 tochaseh−i(K)(the proof for rules cr1 and cr3 is analogous). Hence, a PI of the formB1⊓ · · · ⊓Bn⊑ ≥k R, whereBj,1≤j≤n, are basic concepts andRis a basic role, is applied inchaseh−i(K)to a set of membership asser- tionsS^′=SB₁(a)∪ · · · ∪SBn(a), such thatk1=♯{b|R(a, b)∈chaseh−i(K)}< k.

Therefore, chaseh−i+1(K) = chaseh−i(K)∪ {R(a, bk₁+1), . . . , R(a, bk)}, where bk₁+1, . . . , bk ∈ΓN follow lexicographically all constants occurring inchaseh−i(K).

Since every rule used in the construction of chase(K) is necessary for generating Gh−i+1 and the set of membership assertions {R(a, bk₁+1), . . . , R(a, bk)} does not have successors inchaseh−i+1(K), so{R(a, bk₁+1), . . . , R(a, bk)} ⊆ Gh−i+1.

Let{R(a, b1), . . . , R(a, bk₂)},0 ≤ k2 ≤ k1, be the minimal set of membership assertions that has successors in chaseh−i+1(K). Then, according to the definition of Gi, the set{R(a, bk₂+1), . . . , R(a, bk₁)} ⊆ Gh−i+1. By the inductive assumption,

|ˆq| = |Gh−i+1|, and{R(a, bk₂+1), . . . , R(a, bk)} ⊆ Gh−i+1, hence,qˆhas to contain the atoms R(x, yk₂+1), . . . , R(x, yk), where yk₂+1, . . . , yk appear only in the men- tioned predicate atoms and possibly inequalities. We show that inqˆthere must be all the possible inequalitiesyj₁ 6=yj₂, fork2+ 1≤j1< j2≤k.

By contradiction, assume that not all variables are related to each other by inequalities, i.e. there arem≥2sets of variablesyk₂+1, . . . , yksuch that the variables within one set are mutually unequal and variables from different sets are not constrained to be different. It means that the part ofqˆwithyk₂+1, . . . , yklooks as follows:

R(x, y11), . . . , R(x, y1l₁), y116=y12, . . . , y1l₁−16=y1l₁, R(x, y21), . . . , R(x, y2l₂), y216=y22, . . . , y2l₂−16=y2l₂,

. . .

R(x, ym1), . . . , R(x, ymlm), ym16=ym2, . . . , ymlm−16=ymlm, wherelj ≥ 1,j ∈ {1, . . . , m},P

lj = k−k2. Therefore,qˆhas to be the result of unfolding by part (2) of the algorithmPerfectRefuof a queryq1with the corresponding

(11)

partEl₁R(x), . . . , ElmR(x). However, the algorithm cannot produce such a query: the functionremdupwould keep inq1only one such atomElmaxR(x)withlmax< k−k2, lmax= maxj{lj}, which unfolded would contain less thank−k2atoms of the form R(x, yj). The contradiction rises from the assumptionm≥2. So,mhas to be equal to 1 and all variablesyk₂+1, . . . , ykappear in pairwise inequalities inq.ˆ

Thus, there exists a queryq1inPR1that contains the atomEk−k₂R(x),k−k2≤k, andqˆis obtained fromq1by part (2) of the algorithm. Then, by step (a) it follows that there exists a queryq2=remdup(ˆq1[Ek−k₂R(x)/B1(x)∧ · · · ∧Bn(x)])inPR1such thatGh−i is a witness oftw.r.t.q2. Letqˆ1be the result of unfoldingq2 by part (2) of the algorithm, thenGh−iis a witness oftw.r.t.qˆ1as well and|ˆq1| ≥ |Gh−i|.

If|ˆq1| >|Gh−i|it implies that there exists at least one membership assertionf in Gh−isuch that there exist at least two atomsg1, g2inqˆ1that both unify withf. Hence g1 andg2 unify, and by step (b) of the algorithm it follows that in PR there exists q3=remdup(τ(reduce(ˆq1, g1, g2))),|q3|<|ˆq1|andGh−iis a witness oftw.r.t.q3. If

|q3|>|Gh−i|then by applying the argument consecutively there is a queryqˆ2such that

|ˆq2|=|Gh−i|andGh−iis a witness oftw.r.t.qˆ2, which proves the claim.

Finally, we observe that unification of atoms of the query at the end ofPerfectRefu

does not add new answers to the set ans(Q,hT,Ai) since all variables in the new R atoms introduced in part (2) are bound. Therefore, the theorem holds also for the

algorithmPerfectRef. ⊓⊔

Based on the above property, we are finally able to establish correctness of the algorithmAnswerand its computational complexity.

Theorem 11. LetK=hT,Aibe a DL-Lite^(HN)_horn KB,Qa UCQ overT, andta tuple of constants inK. Then,t∈ans(Q,K)if and only ift∈Answer(Q,K).

Theorem 12. The algorithmAnsweris exponential in the size of the TBox, andAC⁰in the size of the ABox (data complexity).

Note that we can modifyPerfectRef to get an algorithm for the decision problem associated with the query answering problem that runs in nondeterministic polynomial time in combined complexity: it nondeterministically returns one of the CQs from the reformulation of the input query (polynomially many rewriting steps) and in polynomial time checks whether a tuple is in the answer to this CQ. The corresponding version of Answeralso runs in nondeterministic polynomial time.

5 Conclusions

This paper presents practical algorithms for query answering and knowledge base satisfiability inDL-Lite^(HN_horn⁾. They are based on the same idea as those devised for the originalDL-Litelogics. The distinguishing feature of these algorithms is a separation between TBox and ABox reasoning, which enables an interesting modularization of query answering: the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine. We showed that the algorithms are sound and complete.

The complexity of the developed algorithms is AC⁰ w.r.t. data complexity, NP in the size of the TBox and NP w.r.t. combined complexity.

(12)

References

1. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. on Data SemanticsX(2008) 133–173

2. Rodr´ıguez-Muro, M.: Tools and Techniques for Ontology Based Data Access in Lightweight Description Logics. PhD thesis, KRDB Research Centre for Knowledge and Data, Free Uni- versity of Bozen-Bolzano (2010)

3. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of the 21st ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2002). (2002) 233–246 4. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: TheDL-Litefamily. J. of Automated Reasoning39(3) (2007) 385–429

5. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: TheDL-Litefamily and rela- tions. J. of Artificial Intelligence Research36(2009) 1–69

6. Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QUONTO: QUeryingONTOlogies. In: Proc. of the 20th Nat. Conf. on Artificial Intelligence (AAAI 2005). (2005) 1670–1671

7. Poggi, A., Rodr´ıguez-Muro, M., Ruzzi, M.: Ontology-based database access with DIG-Mastro and the OBDA Plugin for Prot´eg´e. In Clark, K., Patel-Schneider, P.F., eds.: Proc. of the 4th Int. Workshop on OWL: Experiences and Directions (OWLED 2008 DC). (2008)

8. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F., eds.: The De- scription Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)

9. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley Publ. Co.

(1995)