• Aucun résultat trouvé

Query Rewriting in DL-Lite_^(HN)_horn

N/A
N/A
Protected

Academic year: 2022

Partager "Query Rewriting in DL-Lite_^(HN)_horn"

Copied!
12
0
0

Texte intégral

(1)

Query Rewriting in DL-Lite

(HN)

horn

Elena Botoeva, Alessandro Artale, and Diego Calvanese KRDB Research Centre

Free University of Bozen-Bolzano I-39100 Bolzano, Italy [email protected]

Abstract. In this paper we present practical algorithms for query answering and knowledge base satisfiability checking inDL-Lite(HN)horn , a logic from the extended DL-Litefamily that contains horn concept inclusions and number restriction. This logic is the most expressive DL that is shown to be FOL-rewritable. The algo- rithms we present are based on the rewriting technique so that reasoning over the TBox and over the ABox can be done independently of each other, and the inference problems are reduced to first order query evaluation. This allows for employing relational database technology for the final query evaluation and gives optimal data complexity.

1 Introduction

Query answering is the main reasoning task in the setting of ontology based data ac- cess [1,2] and data integration [3], where large amounts of data are stored in external databases, and accessed through a conceptual layers provided by an ontology (expressed in terms of a description logic knowledge base). Query answering in this case requires reasoning, and to perform it efficiently in practice the underlying description logic has to be ‘lite’ enough, to be more precise it should enjoy first-order rewritability: it should be possible to rewrite a queryqposed over the ontology in terms of a new query that can be directly evaluated over the data, and that provides the same answers as those provided byq.

TheDL-Litefamily [4] is a family of description logics that enjoy such nice compu- tational properties, i.e., data complexity of query answering is in AC0. ForDL-Litecore

andDL-LiteFcore1, the basic logics of theDL-Litefamily, a polynomial query answering algorithm was established in [4], and later extended toDL-LiteA[1]. The idea of the algorithm is to first rewrite the query by taking into account the assertions in the TBox and then to evaluate the rewritten query over the ABox. Based on this approach, several systems were implemented, notably QUONTO[6,7].

DL-Lite(HN)horn is a more expressive logic than DL-LiteA, which contains number restrictions (as opposed to global functionality assertions), allows for conjunction of basic concepts on the left-hand side of concept inclusions, and for role inclusions that

This work has been partially supported by the EU project Ontorule (ICT-231875).

1Notice that we adopt here the naming convention for logics introduced in [5].

267

(2)

however cannot interact with maximum number restriction. As shown in [5], alsoDL- Lite(HNhorn)is FOL-rewritable, i.e., a similar approach to the one discussed above can be used for query answering over ontologies. However, the algorithm presented in [5] is not immediately implementable, since it would generate queries that would be extremely difficult to process and optimize by a DBMS. Indeed as demonstrated by recent ex- periments with ontology based data access systems [2], commercial relational DBMSs are not designed and optimized to process complex queries (where, e.g., joins are per- formed over unions), and for such kinds of queries performance degrades dramatically when the size of the data increases.

Therefore, in this paper we address the problem of devising an algorithm for an- swering unions of conjunctive queries inDL-Lite(HNhorn)that is based on rewriting, and where the rewriting step generates again a union of conjunctive queries. Such an algo- rithm can be directly implemented in a system like QUONTOby extending the current algorithm forDL-LiteA. Moreover, since current DBMSs are optimized for the evalu- ation of conjunctive queries, they can process the queries generated by our rewriting algorithm more efficiently than queries generated by an algorithm that tries to delegate complex operations to the DBMS.

Summing up, our contributions are the following:

– We present an algorithm for answering unions of conjunctive queries posed toDL- Lite(HNhorn)knowledge bases that is in AC0for data complexity. We employ the query rewriting approach, that is, query answering is performed in two steps. First, the initial query is rewritten using the TBox. Then, the rewritten query is evaluated over the ABox. The main advantage of this approach is that the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine.

– We provide an algorithm that checks satisfiability ofDL-Lite(HNhorn)knowledge bases by evaluating a first-order query over the ABox and that is in AC0for data complex- ity. Thus, the knowledge base satisfiability problem is reduced to query evaluation and again the TBox and the ABox are processed independently of each other.

2 The Description Logic DL-Lite

(hornHN)

In this section we present the logicDL-Lite(HNhorn)and give other preliminary definitions.

The language ofDL-Lite(HNhorn)contains atomic concept and role names, respectively denoted byAandP, possibly with subscripts.Basic roles andconcepts, denoted re- spectively by R andB, possibly with subscripts, are defined asR ::= P |P and B::=⊥ |A| ≥k R, wherekis a positive integer.≥k Ris called anumber restriction.

ADL-Lite(HNhorn)TBox,T, is a finite set ofconceptandrole inclusion axiomsof the formB1⊓ · · · ⊓Bn⊑BandR1⊑R2, respectively, androle constraintsDis(R1, R2), Asym(P),Sym(P),Irr(P)andRef(P). The TBoxT may also contain occurrences of qualified number restrictions≥k R.Bon the right-hand side of concept inclusions. The TBox assertions must satisfy the following conditions:

(3)

(inter) ifRhas a proper sub-role inT (i.e.,R ⊑ R,R 6⊑Rfor someR), thenT does not contain occurrences of number restrictions≥k Ror≥k R withk≥2 on the left-hand side of concept inclusions.

(exists) if ≥k R.B occurs in T, then T does not contain occurrences of≥kR or

≥kR, fork ≥2, on the left-hand side of concept inclusions.

Note that disjointness between conceptsB1 andB2 is expressed asB1⊓B2 ⊑ ⊥.

Also,DL-Lite(HNhorn)allows for expressing local cardinality constraints, e.g., the assertion A⊓ ≥3R⊑ ⊥is equivalent toA⊑ ≤2R.

An ABox A is a finite set of membership assertions of the form A(a),¬A(a), P(a, b), and¬P(a, b).T andAconstitute theknowledge baseK=hT,Ai.

In the following, we useR to denoteP ifR =P andP ifR = P,R(x, y) to denoteP(x, y)ifR = P, andP(y, x)ifR =P. For a TBoxT, let⊑± denote the closure under inverses of the subrole relation:R ⊑± R ∈ T iffR ⊑R ∈ T or R⊑R′−∈ T.

The formal semantics relies on the standard notion of interpretation [8]. Here we adopt theunique name assumption(UNA). In the following we assume that TBoxes do not contain role constraintsAsym(P),Sym(P),Irr(P),Ref(P)and qualified number restrictions: we can get rid of them as described in [5].

In this work we concentrate on two reasoning tasks forDL-Lite(HN)horn , to which other reasoning tasks can be reduced: knowledge base satisfiability and query answering. The KB satisfiability problem is to check, given a KBK, whether K admits at least one model. To define the query answering problem, we first provide some definitions.

A conjunctive query (CQ) q over a KB K is a first-order formula of the form:

q(x) =∃y.conj(x,y), whereconj(x,y)is a conjunction of atoms of the formA(t) and P(t1, t2), and t, t1, t2 are either constants in K or variables in x andy, x are the free variables of q, also called distinguished variables. A union of conjunctive queries (UCQ)q is a formula of the formq(x) = W

i=1,...,n∃yi.conji(x,yi), with conji(x,yi)as before. Given a queryq(either a CQ or a UCQ) and an interpretation I, we denote byqIthe set of tuples of elements of∆I obtained by evaluatingqinI.

The answer toqover a KBKis the setans(q,K)of tuplesa of constants appearing inKsuch thataI ∈qI, for every modelIofK. Each such tuple is calledcertain an- swer. Observe that, ifKis unsatisfiable, thenans(q,K)is trivially the set of all possible tuples of constants inKwhose arity is the one of the query. We denote such a set by AllTup(q,K). Thequery answering problemis defined as follows: given a KBKand a queryq(either a CQ or a UCQ) overK, compute the setans(q,K).

3 Knowledge Base Satisfiability

To check KB satisfiability, we exploit the notions of canonical interpretation and closure of negative inclusions. First, we define negative and positive inclusion assertions, and the database interpretation.

We callnegative inclusions (NI)assertions of the formB1⊓ · · · ⊓Bn ⊑ ⊥and Dis(R1, R2). Other assertions are calledpositive inclusions (PI).TN denotes the set of all negative inclusions inT, andTP the set of positive inclusions inT. Obviously,

(4)

T =TN∪ TP. Note that NIs include also those assertions that express functionality of roles, e.g.≥2R⊑ ⊥, and more in general maximum number restrictions.

Thedatabase interpretationdb(A) = h∆db(A)db(A)iof an ABoxA is defined as follows: ∆db(A) is the nonempty set consisting of all constants occurring in A;

adb(A) = a, for each constanta;Adb(A) = {a| A(a) ∈ A}, for each atomic con- ceptA;Pdb(A)={(a1, a2)|P(a1, a2)∈ A}, for each atomic roleP.

The canonical interpretation is an interpretation constructed according to the notion of chase [9]. Following [4] we can construct the chase of a KB starting from the ABox, and applying positive inclusions to sets of membership assertions. In the definition of chase, we concentrate here on the differences with respect to the definition of chase given in [4]. We remark, however, that for the application of the chase rules we assume to have a total (lexicographic) ordering on the assertions and on all the constants (in- cluding the newly introduced ones). ThechaseofKis the set of membership assertions chase(K) =S

j∈NSj, whereS0 =A,Sj+1=Sj∪Sjnew, andSjnew is the set of new membership assertions obtained fromSj according to the chase rules cr1, cr2, cr3:

cr1 ifI =B1⊓ · · · ⊓Bn ⊑A,S =SB1(a)∪ · · · ∪SBn(a), andA(a)∈ S/ j, then Sjnew={A(a)},

cr2 ifI=B1⊓· · ·⊓Bn ⊑ ≥k RandS=SB1(a)∪· · ·∪SBn(a),k1=♯{b|R(a, b)∈ S}< k, thenSjnew ={R(a, bk1+1), . . . , R(a, bk)}, wherebk1+1, . . . , bkarek−k1

new constants inΓN that follow lexicographically the constants introduced in the previous steps,

cr3 ifI=R1±R2,S={R1(a, b)}, andR1(a, b)∈ S/ jthenSjnew={R2(a, b)}, whereSis the first (in lexicographic order) set of membership assertions inSjs.t. there exists a PI applicable2to it,Iis the first such PI, and we useSB(a)to denote{A(a)}

ifB =Aand{R(a, b1), . . . , R(a, bk)}ifB=≥k R, withb1, . . . , bksome constants.

We denote bychasei(K)the portion of the chase obtained afteriapplications of the chase rules.

The canonical interpretation is the interpretation can(K) = h∆can(K)can(K)i where∆can(K)is the set of constants occurring in chase(K),acan(K) = a, for each constant a, Acan(K) = {a | A(a) ∈ chase(K)}, for each atomic concept A, and Pcan(K)={(a1, a2)|P(a1, a2)∈chase(K)}, for each atomic roleP. The canonical interpretation is constructed in such a way that all the positive inclusions of the TBox are satisfied, so a knowledge base where the TBox contains only positive inclusions is always satisfiable. The following lemma establishes a notable property ofcan(K).

Lemma 1. LetK = hT,Aibe a DL-Lite(HNhorn)KB, and letTP be the set of positive inclusion assertions inT. Thencan(K)is a model ofhTP,Ai.

In order to check satisfiability ofDL-Lite(HNhorn)KBs, negative inclusions must be considered. Thus, if a negative inclusion in the TBox is violated by membership asser- tions of the ABox, then the knowledge base is inconsistent and, therefore, unsatisfiable.

Besides, an interaction of positive and negative inclusions may cause inconsistency. So we need to consider all NIs implied by the TBox.

2The notion ofapplicabilityof a PI suitably extends the one in [4].

(5)

Definition 2. LetT be a TBox. We callNI-closure ofT, denoted bycln(T), the fol- lowing set of assertions defined inductively:

1. TN ⊆ cln(T).

2. ifB1⊓ · · · ⊓Bn⊓A⊑ ⊥ ∈cln(T),n≥0andB1 ⊓ · · · ⊓Bm⊑Ais inT, then alsoB1⊓ · · · ⊓Bn⊓B1 ⊓ · · · ⊓Bm ⊑ ⊥ ∈cln(T).

3. ifB1⊓ · · · ⊓Bn⊓ ≥k R⊑ ⊥ ∈cln(T),n≥0andB1⊓ · · · ⊓Bm ⊑ ≥kRis in T fork≥k, then alsoB1⊓ · · · ⊓Bn⊓B1⊓ · · · ⊓Bm ⊑ ⊥ ∈cln(T).

4. ifB1⊓ · · · ⊓Bn⊓ ≥1R⊑ ⊥ ∈cln(T),n≥0andR± Ris inT, then also B1⊓ · · · ⊓Bn⊓ ≥1R⊑ ⊥ ∈cln(T).

5. if Dis(R, R1) or Dis(R1, R) ∈ cln(T) and R± R is in T, then also Dis(R, R1)∈cln(T).

6. if either≥1R ⊑ ⊥, or ≥1R ⊑ ⊥, orDis(R, R)is incln(T), then all three assertions are incln(T).

Note, that in rule 4 it is enough to considerk = 1 because the condition (inter) ensures that concepts of the form≥k R, fork ≥2, do not occur in the left-hand side of concept inclusions whenR appears in the right-hand side of a role inclusion. Also we don’t need to add both inclusionsDis(R, R1)andDis(R1, R)tocln(T), since to trigger rule 5 one of them is sufficient.

The canonical interpretation can also be exploited for checking satisfiability of a KB containing negative inclusions. To establish that they are satisfied bycan(K), it suffices to verify that the interpretationdb(A)satisfiescln(T).

Lemma 3. LetK=hT,Aibe a DL-Lite(HNhorn)KB. Thencan(K)is a model ofKif and only ifdb(A)is a model ofhcln(T),Ai.

The proof follows the line of that of Lemma 12 in [4], but we need to take into account the modified definition of chase, and hence of can(K). Now we can show that to check satisfiability of a KB it is is sufficient (and necessary) to look atdb(A) (provided we have computedcln(T)). More precisely, the next theorem shows that a contradiction on aDL-Lite(HNhorn)KB may hold only if a membership assertion in the ABox contradicts a negative inclusion in the closurecln(T).

Theorem 4. LetK=hT,Aibe a DL-Lite(HN)horn KB. ThenKis satisfiable if and only if db(A)is a model ofhcln(T),Ai.

Having these results, we can formulate the satisfiability problem in terms of evalua- tion of a first order query over the database (interpretation). In order to do so we define a translationδfrom assertions incln(T)to FOL formulas encoding their violation:

δ(B1⊓ · · · ⊓Bn⊑ ⊥) =∃x(γB1(x)∧ · · · ∧γBn(x)) δ(Dis(R1, R2)) =∃x, y(ρR1(x, y)∧ρR2(x, y)) where:

γBi(x) =A(x), ifBi=A;

γBi(x) =∃y1, . . . , yk(P(x, y1)∧ · · · ∧P(x, yk)∧V

j<lyj 6=yl), ifBi=≥k P; γBi(x) =∃y1, . . . , yk(P(y1, x)∧ · · · ∧P(yk, x)∧V

j<lyj 6=yl), ifBi=≥k P; ρRi(x, y) =P(x, y), ifRi=P; ρRi(x, y) =P(y, x), ifRi =P.

(6)

AlgorithmConsistent(K)

Input:DL-Lite(HN)horn KBK=hT,Ai

Output:trueifKis satisfiable,falseotherwise qunsat =⊥;

for eachI∈cln(T)doqunsat =qunsat∨δ(I);

ifqunsat

db(A)=∅then returntrue;else returnfalse;

Fig. 1.The algorithmConsistent

The algorithmConsistent, depicted in Fig. 1, takes as input aDL-Lite(HNhorn)KB, computes db(A)andcln(T), and evaluates overdb(A)the Boolean FOL query ob- tained by taking the union of all FOL formulas returned by the application of the above functionδto every assertion incln(T). In the algorithm, the symbol⊥indicates a pred- icate whose evaluation is false in every interpretation. Therefore, whenKcontains no negative inclusions,qunsat

db(A)=⊥db(A), andConsistent(K)returns true.

One can show that the algorithmConsistentterminates and a KBKis satisfiable if and only ifConsistent(K) = true. As a direct consequence, we get the following theorem, which provides an alternative proof of an analogous result shown in [5].

Theorem 5. In DL-Lite(HN)horn , knowledge base satisfiability is FOL-rewritable.

Now we can characterize the computational complexity of the algorithmConsistent.

Theorem 6. The algorithmConsistentisAC0in the size of the ABox and runs in ex- ponential time in the size of the TBox.

The following example demonstrates the worst case behaviour of the algorithm, where the size ofcln(T)is exponential in the size ofT.

Example 7. Let us consider the TBoxT consisting of the assertions:A1 ⊑ A1, . . . , An ⊑An,A1⊓ · · · ⊓An⊑ ⊥. Thencln(T)contains2nnegative inclusion assertions of the formB1⊓· · ·⊓Bn ⊑ ⊥, where eachBiis eitherAiorAi. Moreover, none of the assertions can be omitted: for every negative inclusionI =B1⊓ · · · ⊓Bn ⊑ ⊥there is an ABoxAsuch thatδ(I)db(A)=∅. It is enough to takeA={B1(a), . . . , Bn(a)}

witha a fresh object name. Therefore, in order to ensure that the algorithm detects unsatisfiability ofhT,Ai, all the possible negative inclusions must be present incln(T) and its size is exponential in the size ofT.

Note that the algorithmConsistentcan be turned into a non-deterministic PTIME

algorithm for checking unsatisfiability ofDL-Lite(HN)horn KBs.

4 Query Answering

The aim of this section is to devise an algorithm for answering unions of CQs inDL- Lite(HNhorn)that is based on query reformulation, and hence can be easily implemented:

in the rewriting step the assertions of the TBox are compiled into the query, then the resulting query is evaluated over the ABox without considering the TBox.

(7)

LetQ=S

iqibe a UCQ. We say that an argument of an atom in a query isboundif it corresponds either to a constant, or to a distinguished variable, or to a shared variable, that is, a variable occurring at least twice in the query body. Instead, an argument of an atom in a query isunboundif it corresponds to a nondistinguished nonshared variable.

The symbol ‘ ’ is used to represent unbound variables.

LetQRT denote the set of natural numbers containing 1 and all the numerical param- eterskfor which the concept≥k Roccurs inT. Theextensionext(T)ofT contains

– ≥kR ⊑ ≥k R, for allk, k ∈ QRT such thatk > kandk > k′′ > k for no k′′∈QRT andRis either a direct or inverse role inT, and

– ≥k R⊑ ≥k R, for allk∈QRT andR⊑±R∈ T.

A positive concept inclusionI isapplicable to an atomB(x)ifI hasB on the right-hand side, whereB(x) =A(x)ifB =A,B(x) =EkP(x)ifB =≥k P, and B(x) =EkP(x)ifB=≥k P. A positive role inclusionIisapplicable to an atom P(x1, x2)ifIhas eitherPorPon the right-hand side. We indicate withgr(g, I)the atom obtained from the atomgby applying the applicable inclusionI. Formally:

Definition 8. LetI∈ext(T)be a positive inclusion assertion that is applicable to the atomg. Then,gr(g, I)is the formula defined as follows:

1. ifg=A(x)andI=B1⊓ · · · ⊓Bn⊑A, thengr(g, I) =B1(x)∧ · · · ∧Bn(x), 2. if g = EkR(x) and I = B1⊓ · · · ⊓Bn ⊑ ≥kR, k ≥ k, then gr(g, I) =

B1(x)∧ · · · ∧Bn(x),

3. ifg =E1R( )andI=B1⊓ · · · ⊓Bn ⊑ ≥k R, thengr(g, I) =B1(x)∧ · · · ∧ Bn(x), for a fresh variablex,

4. ifg=EkR(x)andI=P1±R, thengr(g, I) =EkP1(x), 5. ifg=EkR(x)andI=P1±R, thengr(g, I) =EkP1(x), 6. ifg=P(x1, x2)andI=P1±P, thengr(g, I) =P1(x1, x2), 7. ifg=P(x1, x2)andI=P1±P, thengr(g, I) =P1(x2, x1).

We also define themost general unifier,mgu, between two atomsg1, g2of a query qthat unify. In the case whereg1 andg2are respectively of the formA(x)andA(z), orP(x, y)andP(z, w), themguis defined as usual (see [4]), taking also into account the possible presence of inequalitiesx 6=z(andy 6= w). However, we allow also an atomP(x, y)to unify withE1P(z)andE1P(w). Moreover, when one of the atoms is of the formEkR(x), then the mgu is defined as follows: let g1 = Ek1R(x)and g2=Ek2R(z),k=max(k1, k2), andx6=zdoes not occur inq; org1=EkR(x)and g2=E1R( ), thenmgu=EkR(x).

In Fig. 2 we provide the algorithmPerfectRef, which reformulates a UCQ taking into account the PIs of a TBoxT. In the algorithm,q[g/g]denotes the CQ obtained from a CQqby replacing the atomgwith a new atomg. The functionremdupremoves from the body of a CQ duplicated atoms, or if a query contains two atoms of the form Ek1R(x)andEk2R(x), then it removes the atom with the smallerki. Furthermore,τis a function that takes as input a CQqand returns a new CQ obtained by replacing each occurrence of an unbound variable inqwith the symbol , wherebyP(x, )becomes E1P(x),P( , x)becomesE1P(x)(andP( , )becomesE1P( )). Finally,reduce is a function that takes as input a CQqand two atomsg1 andg2that unify and occur

(8)

AlgorithmPerfectRef(Q,T)

Input:union of conjunctive queriesQ,DL-Lite(HN)horn TBoxT Output:union of conjunctive queriesPR

PR:={remdup(τ(q))|qis a CQ inQ};

repeat (1)

PR:=PR;

for eachq∈PR (a)for eachginq

for eachPIIinext(T) ifIis applicable tog

thenPR:=PR∪ {remdup(q[g/gr(g, I)])};

(b)for eachg1,g2inq ifg1andg2unify

thenPR:=PR∪ {remdup(τ(reduce(q, g1, g2)))};

untilPR=PR;

for eachq∈PR (2) for eachginq

ifgis of the formEkR(x),k≥2,thenreplacegwithγk R(x);

ifgis of the formE1R(x)thenreplacegwithR(x, );

returnPR.

Fig. 2.The algorithmPerfectRef

in the body ofq, and returns a CQqobtained by applying toqthe most general unifier betweeng1andg2.

Informally, part (1) of the algorithm reformulates the query by replacing and unify- ing atoms, and accumulates the new queries. In this part all possible applications of the PIs, according to Definition 8, are exhausted. Part (2) performs unfolding of the atoms EkR(x). Notice that it is sufficient to perform unfolding of the atomsEkR(x)in the very end for the following reasons: first, ifk ≥ 2, then only role inclusions could be applied to the atoms of the formR(x, y); however condition (inter) ensures that in this case there are no such inclusions. Ifk= 1, then such role inclusions have already been aplied in step (1). Second, one could reduce some of the atomsR(x, y), but it would not produce new answers, since all variables in the newR-atoms are bound.

This algorithm is an extension of thePerfectRefalgorithm devised for the logicDL- LiteA[4]. The extended version has to deal with inequality atoms and with atoms of the formEkR(x)that later need to be unfolded. In contrast with the algorithm forDL-LiteA

the query may grow, so one also needs to take care of redundant atoms. Notice also that the number of CQs inPRmay be exponential in the size of the TBox (and not only in the length ofq), due to the fact that horn inclusions may cause a single CQ to become of length linear in the size of the TBox. Here, we assume that numbers in the TBox are coded in unary.

Now, to compute the answers toQover the KBK=hT,Ai, we need to evaluate the set of conjunctive queriesPRproduced by the algorithmPerfectRefover the ABox Aconsidered as a relational database. The algorithm computingans(Q,K)is exactly the same as in [4], so we do not present it here. Correctness of the above described query-answering technique is established in the following. We start by observing that,

(9)

as in [4], query answering can in principle be done by evaluating the query over the modelcan(K).

Theorem 9. LetKbe a satisfiable DL-Lite(HNhorn)KB, and letQbe a union of conjunc- tive queries overK. Then,ans(Q,K) =Qcan(K).

Sincecan(K)is in general infinite, we cannot compute it and evaluateQover it.

Instead, we compile the TBox into the query, thus simulating the evaluation of the query overcan(K)by evaluating a finite reformulation of the query over the ABox considered as a database. The proof of the following lemma is inspired by the one forDL-LiteRin [4], but needs to take into account number restrictions and horn inclusion assertions.

Lemma 10. LetT be a DL-Lite(HNhorn)TBox,Qa UCQ overT, andPRthe UCQ re- turned byPerfectRef(Q,T). For every DL-Lite(HN)horn ABoxAsuch thathT,Aiis satis- fiable,ans(Q,hT,Ai) =PRdb(A).

Proof. We first introduce the preliminary notion ofwitness of a tuple of constantswith respect to a CQ q inQ. Given a DL-Lite(HN)horn knowledge base K = hT,Ai, a CQ q(x)← conj(x,y)overK, and a tupletof constants occurring inK, a set of mem- bership assertionsG is awitnessof tw.r.t.qif there exists a substitution σfrom the variablesyinconj(t,y)to constants inGsuch that the set of atoms inσ(conj(t,y)) is equal to G. Then t ∈ qcan(K) iff there exists a witness G of t w.r.t. q such that G ⊆chase(K). The cardinality of a witnessG, denoted by|G|, is the number of mem- bership assertions inG.

Let us prove first the statement for a modified version ofPerfectRef, where at the end of the algorithm we perform an additional step of unifications as in step (b) of part (1). Let us callPerfectRefuthis extended version ofPerfectRef.

We have that ans(Q,hT,Ai) = Qcan(K) = S

q∈Qqcan(K), and PRdb(A) = S

q∈ˆ PRdb(A), wherePRis the UCQ returned byPerfectRefu(Q,T). Hence, we need to show thatS

ˆ

q∈PRdb(A) =Qcan(K). For simplicity, we consider the case whereQ consists of a single CQq.

“⇐” We have to prove thatqˆdb(A) ⊆ qcan(K), for eachqˆ∈ PR. Letqi+1be ob- tained fromqiby some step of the algorithmPerfectRefu. We can show thatqi+1can(K)⊆ qcani (K)at any step. Since each query ofPRis eitherqor a query obtained fromqby repeatedly applying steps (a) and (b) of the algorithmPerfectRefu, then by the rewrit- ing in part (2), and the unification step in the end, it follows that for eachqˆ ∈ PR, ˆ

qcan(K) ⊆ qcan(K), by repeatedly applying the property qcani+1(K) ⊆ qican(K). Since db(A)⊆can(K)and CQs are monotonic queries, we getqˆdb(A)⊆qˆcan(K)⊆qcan(K) for each CQqˆ∈PR.

“⇒” We have to show that for each tuplet ∈ qcan(K), there existsqˆ∈ PRsuch thatt ∈qˆdb(A). First, sincet ∈qcan(K), it follows that there exists a finite numberh such that there is a witnessGhoftw.r.t.qcontained inchaseh(K). Moreover, w.l.o.g.

we can assume that every rule cr1, cr2 and cr3 used in the construction ofchase(K)is necessary in order to generate such a witnessGh. In the following, we say that a setSof membership assertions is anancestorof a setSof membership assertions in a setSof

(10)

membership assertions, if there existS1, . . . , SninS, whereS1=SandSn =S, such that, for eachj∈ {2, . . . , n},Sj can be generated by applying a chase rule to a subset ofSj−1. We also say thatSis a successor ofS. Furthermore, for eachi∈ {0, . . . , h}, we denote withGithepre-witnessoftw.r.t.qinchaseh(K), defined as follows:

Gi= [

S⊆Gh

{S⊆chasei(K) | Sis an ancestor ofSinchaseh(K)and there exists no successor ofSinchasei(K) that is an ancestor ofSinchaseh(K)}.

Now we prove by induction onithat, starting fromGh, we can “go back” through the rule applications and find a queryqˆinPRsuch that the pre-witnessGh−ioftw.r.t.

qinchaseh−i(K)is also a witness oftw.r.t.q. To this aim, we prove that there existsˆ ˆ

q ∈PRsuch thatGh−iis a witness oftw.r.t.qˆand|ˆq|=|Gh−i|, where|ˆq|indicates the number of atoms in the CQqˆexcept inequality atoms. The claim then follows for i=h, sincechase0(K) =A.

Base step: There existsqˆ∈PRsuch thatGhis a witness oftw.r.t.qˆand|ˆq|=|Gh|.

This is an immediate consequence of the fact that (1)q∈PRand (2)PRis closed with respect to step (b) of the algorithmPerfectRefu.

Inductive step: Suppose there existsqˆ∈PRsuch thatGh−i+1is a witness oftw.r.t.

ˆ

qinchaseh−i+1(K)and|q|ˆ =|Gh−i+1|. Let us assume thatchaseh−i+1(K)is obtained by applying chase rule cr2 tochaseh−i(K)(the proof for rules cr1 and cr3 is analo- gous). Hence, a PI of the formB1⊓ · · · ⊓Bn⊑ ≥k R, whereBj,1≤j≤n, are basic concepts andRis a basic role, is applied inchaseh−i(K)to a set of membership asser- tionsS=SB1(a)∪ · · · ∪SBn(a), such thatk1=♯{b|R(a, b)∈chaseh−i(K)}< k.

Therefore, chaseh−i+1(K) = chaseh−i(K)∪ {R(a, bk1+1), . . . , R(a, bk)}, where bk1+1, . . . , bk ∈ΓN follow lexicographically all constants occurring inchaseh−i(K).

Since every rule used in the construction of chase(K) is necessary for generating Gh−i+1 and the set of membership assertions {R(a, bk1+1), . . . , R(a, bk)} does not have successors inchaseh−i+1(K), so{R(a, bk1+1), . . . , R(a, bk)} ⊆ Gh−i+1.

Let{R(a, b1), . . . , R(a, bk2)},0 ≤ k2 ≤ k1, be the minimal set of membership assertions that has successors in chaseh−i+1(K). Then, according to the definition of Gi, the set{R(a, bk2+1), . . . , R(a, bk1)} ⊆ Gh−i+1. By the inductive assumption,

|ˆq| = |Gh−i+1|, and{R(a, bk2+1), . . . , R(a, bk)} ⊆ Gh−i+1, hence,qˆhas to contain the atoms R(x, yk2+1), . . . , R(x, yk), where yk2+1, . . . , yk appear only in the men- tioned predicate atoms and possibly inequalities. We show that inqˆthere must be all the possible inequalitiesyj1 6=yj2, fork2+ 1≤j1< j2≤k.

By contradiction, assume that not all variables are related to each other by inequal- ities, i.e. there arem≥2sets of variablesyk2+1, . . . , yksuch that the variables within one set are mutually unequal and variables from different sets are not constrained to be different. It means that the part ofqˆwithyk2+1, . . . , yklooks as follows:

R(x, y11), . . . , R(x, y1l1), y116=y12, . . . , y1l1−16=y1l1, R(x, y21), . . . , R(x, y2l2), y216=y22, . . . , y2l2−16=y2l2,

. . .

R(x, ym1), . . . , R(x, ymlm), ym16=ym2, . . . , ymlm−16=ymlm, wherelj ≥ 1,j ∈ {1, . . . , m},P

lj = k−k2. Therefore,qˆhas to be the result of unfolding by part (2) of the algorithmPerfectRefuof a queryq1with the corresponding

(11)

partEl1R(x), . . . , ElmR(x). However, the algorithm cannot produce such a query: the functionremdupwould keep inq1only one such atomElmaxR(x)withlmax< k−k2, lmax= maxj{lj}, which unfolded would contain less thank−k2atoms of the form R(x, yj). The contradiction rises from the assumptionm≥2. So,mhas to be equal to 1 and all variablesyk2+1, . . . , ykappear in pairwise inequalities inq.ˆ

Thus, there exists a queryq1inPR1that contains the atomEk−k2R(x),k−k2≤k, andqˆis obtained fromq1by part (2) of the algorithm. Then, by step (a) it follows that there exists a queryq2=remdup(ˆq1[Ek−k2R(x)/B1(x)∧ · · · ∧Bn(x)])inPR1such thatGh−i is a witness oftw.r.t.q2. Letqˆ1be the result of unfoldingq2 by part (2) of the algorithm, thenGh−iis a witness oftw.r.t.qˆ1as well and|ˆq1| ≥ |Gh−i|.

If|ˆq1| >|Gh−i|it implies that there exists at least one membership assertionf in Gh−isuch that there exist at least two atomsg1, g2inqˆ1that both unify withf. Hence g1 andg2 unify, and by step (b) of the algorithm it follows that in PR there exists q3=remdup(τ(reduce(ˆq1, g1, g2))),|q3|<|ˆq1|andGh−iis a witness oftw.r.t.q3. If

|q3|>|Gh−i|then by applying the argument consecutively there is a queryqˆ2such that

|ˆq2|=|Gh−i|andGh−iis a witness oftw.r.t.qˆ2, which proves the claim.

Finally, we observe that unification of atoms of the query at the end ofPerfectRefu

does not add new answers to the set ans(Q,hT,Ai) since all variables in the new R atoms introduced in part (2) are bound. Therefore, the theorem holds also for the

algorithmPerfectRef. ⊓⊔

Based on the above property, we are finally able to establish correctness of the algorithmAnswerand its computational complexity.

Theorem 11. LetK=hT,Aibe a DL-Lite(HN)horn KB,Qa UCQ overT, andta tuple of constants inK. Then,t∈ans(Q,K)if and only ift∈Answer(Q,K).

Theorem 12. The algorithmAnsweris exponential in the size of the TBox, andAC0in the size of the ABox (data complexity).

Note that we can modifyPerfectRef to get an algorithm for the decision problem associated with the query answering problem that runs in nondeterministic polynomial time in combined complexity: it nondeterministically returns one of the CQs from the reformulation of the input query (polynomially many rewriting steps) and in polynomial time checks whether a tuple is in the answer to this CQ. The corresponding version of Answeralso runs in nondeterministic polynomial time.

5 Conclusions

This paper presents practical algorithms for query answering and knowledge base sat- isfiability inDL-Lite(HNhorn). They are based on the same idea as those devised for the originalDL-Litelogics. The distinguishing feature of these algorithms is a separation between TBox and ABox reasoning, which enables an interesting modularization of query answering: the part of the process requiring TBox reasoning is independent of the ABox, and the part of the process requiring access to the ABox can be carried out by an SQL engine. We showed that the algorithms are sound and complete.

The complexity of the developed algorithms is AC0 w.r.t. data complexity, NP in the size of the TBox and NP w.r.t. combined complexity.

(12)

References

1. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: Linking data to ontologies. J. on Data SemanticsX(2008) 133–173

2. Rodr´ıguez-Muro, M.: Tools and Techniques for Ontology Based Data Access in Lightweight Description Logics. PhD thesis, KRDB Research Centre for Knowledge and Data, Free Uni- versity of Bozen-Bolzano (2010)

3. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of the 21st ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2002). (2002) 233–246 4. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: TheDL-Litefamily. J. of Automated Reasoning39(3) (2007) 385–429

5. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: TheDL-Litefamily and rela- tions. J. of Artificial Intelligence Research36(2009) 1–69

6. Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QUONTO: QUeryingONTOlogies. In: Proc. of the 20th Nat. Conf. on Artificial Intelligence (AAAI 2005). (2005) 1670–1671

7. Poggi, A., Rodr´ıguez-Muro, M., Ruzzi, M.: Ontology-based database access with DIG-Mastro and the OBDA Plugin for Prot´eg´e. In Clark, K., Patel-Schneider, P.F., eds.: Proc. of the 4th Int. Workshop on OWL: Experiences and Directions (OWLED 2008 DC). (2008)

8. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F., eds.: The De- scription Logic Handbook: Theory, Implementation and Applications. Cambridge University Press (2003)

9. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison Wesley Publ. Co.

(1995)

Références

Documents relatifs

In contrast, the only known complete approach to deciding FO-rewritability of OMQs in which the TBox is formulated in full (non-Horn) ALC and ALCI is via the CSP connection in [9,

In particular, query answer- ing techniques based on FOL reducibility, e.g., [1,2,5,6,7], which reduce query an- swering against a knowledge base (KB) to FOL query evaluation

In particular, [8] propose algorithms for answering temporal queries (w.r.t. TKBs) that generalize TCQs in that they combine queries of a generic atemporal query language Q

In this paper, we consider (temporally ordered) data represented in Description Logics from the popular DL- Lite family and Temporal Query Language, based on the combination of LTL

Inst C (Laptop Model, Portable Category) Inst C (Desktop Model, Non Portable Category) Inst R (ko94-po1, Alienware, has model) Inst R (jh56i-09, iMac, has model) Isa C

It is a classic result in database theory that conjunctive query (CQ) answering, which is NP-complete in general, is feasible in polynomial time when restricted to acyclic

The DL-Lite family consists of various DLs that are tailored towards conceptual modeling and allow to realize query answering using classical database techniques.. We only

We consider the setting of temporal query answering pre- sented in [2] where a temporal knowledge base (TKB) consists of a global TBox and a sequence of ABoxes that represents the