Query Rewriting over Shallow Ontologies

(1)

Query Rewriting over Shallow Ontologies

S. Kikot¹, R. Kontchakov¹, V. Podolskii², and M. Zakharyaschev¹

1 Department of Computer Science and Information Systems Birkbeck, University of London, U.K.

{kikot,roman,michael}@dcs.bbk.ac.uk

2 Steklov Mathematical Institute, Moscow, Russia podolskii@mi.ras.ru

Abstract. We investigate the size of rewritings of conjunctive queries overOWL 2 QL ontologies of depth 1 and 2 by means of a new hypergraph formalism for computing Boolean functions. Both positive and negative results are obtained. All conjunctive queries over ontologies of depth 1 have polynomial-size nonrecursive datalog rewritings; tree- shaped queries have polynomial-size positive existential rewritings; however, for some queries and ontologies of depth 1, positive existential rewritings can only be of superpolynomial size. Both positive existential and nonrecursive datalog rewritings of conjunctive queries and ontologies of depth 2 suffer an exponential blowup in the worst case, while first-order rewritings can grow superpolynomially unlessNP⊆P/poly.

1 Introduction

This paper is a continuation of the series [11, 13, 14], where we investigated the following problems. Letq(x) be a conjunctive query (CQ) with answer variables xand let T be anOWL 2 QL ontology. It is known (see, e.g., [7, 3]) that there exists a first-order formulaq⁰(x), called an FO-rewriting ofq andT, such that (T,A)|=q(a) iffA |=q⁰(a), for any ABoxAand any vectoraof individuals in A(of the same length as x). Thus, to find certain answers toq(x) over T and A, it suffices to find answers to q⁰(x) over the data A, which can (hopefully) be done by conventional relational database management systems (RDBMSs).

Various experiments showed, however, that rewritingsq⁰can be too large for the RDBMSs to cope with. This put forward the followings problems:

– What is the overhead of answering CQs via ontologies compared to standard database query answering in the worst case?

– What is the size of FO-rewritings of CQs andOWL 2 QL ontologies in the worst case?

– Can rewritings of one type (say, nonrecursive datalog) be substantially shorter than rewritings of another type (say, positive existential)?

– Are there interesting and useful sufficient conditions on CQs and ontologies under which rewritings are short?

(2)

We showed [11, 13, 14] that, for a certain sequence of (tree-shaped) CQsq_n and OWL 2 QLTBoxesT_n, the problem ‘A |=q_n?’ is in Pfor combined complexity, while the problem ‘(T_n,A)|=q_n?’ isNP-complete. Moreover, any positive existential (PE) or nonrecursive datalog (NDL) rewriting ofq_nandTnis of exponential size, while any FO-rewriting is of superpolynomial size unlessNP⊆P/poly.

We also showed that NDL-rewritings are in general exponentially more succinct than PE-rewritings, and FO-rewritings can be superpolynomially more succinct than PE-rewritings. On the other hand, Gottlob and Schwentick [8] demon- strated that one can always find a polynomial-size rewriting for the price of polynomially-many additional existential quantifiers over a domain with at least two constants (thus confirming once again that formalisms with nondeterminism are exponentially more succinct; cf. [4]). Finally, Kikot et al.[12] give a practi- cally useful sufficient condition on CQs and ontologies under which PE-rewritings are of polynomial size.

The problem we address in this paper is whether the depth of TBoxes (that is, the maximal depth of the canonical models with single-individual ABoxes) has any impact on the size of rewritings. (The TBoxesT_n mentioned above are of depth n.) In particular, what happens if we restrict the depth of TBoxes to 1 or 2? (PE-rewritings over TBoxes of depth 0 are trivially polynomial.) The obtained results are summarised below:

(1) For any CQ and TBox of depth 1, there is a polynomial-size NDL-rewriting.

(2) PE-rewritings of some CQs and TBoxes of depth 1 are of superpolynomial size.

(3) All tree-shaped CQs and TBoxes of depth 1 have polynomial-size PE- rewritings.

(4) For TBoxes of depth 2, both NDL- and PE-rewritings can suffer an exponential blowup, while FO-rewritings can suffer a superpolynomial blowup (unlessNP⊆P/poly).

We begin by observing that the tree-witness PE-rewritings, representing possible homomorphisms of subqueries of a given CQ to the canonical models with one ABox individual, give rise to a class of monotone Boolean functions associated with hypergraphs and calledhypergraph functions. In particular, hypergraphsH of degree≤2 (every vertex in which belongs to at most 2 hyperedges) correspond to Boolean CQsq_Hand TBoxesTH of depth 1 such that answeringq_H overTH

and single-individual ABoxes amounts to computing the hypergraph function forH. We show then that representing Boolean functions as hypergraphs of degree ≤ 2 is polynomially equivalent to representing them by nondeterministic branching programs (NBP) [10]. This correspondence and known results about NBPs [17, 9] give(1)and(2)above. We show(3)using the tree form of CQs and the fact that, over TBoxes of depth 1, CQsqcan only have≤ |q|tree witnesses.

To obtain(4), we observe that hypergraphs of degree>2 are computationally as powerful as nondeterministic polynomial Boolean circuits (NP/poly) and encode computing the functionCliquen,k(e) (a graph withnvertices has ak-clique) as answering some CQs over TBoxes of depth 2. Although hypergraph representa- tions of Boolean functions are introduced as a technical means to investigate the

(3)

size of rewritings, they may be of independent interest to the complexity theory of Boolean functions. All the omitted proofs can be found in the full version of the paper available at www.dcs.bbk.ac.uk/~roman.

2 OWL 2 QL and the Tree-Witness Rewriting

In this paper, we use the following (simplified) syntax ofOWL 2 QL. It contains individual names ai,concept names Ai, androle namesPi (i≥1).Roles Rand basic concepts B are defined by the grammar:

R ::= Pi | P_i⁻, B ::= ⊥ | Ai | ∃R.

ATBox,T, is a finite set ofinclusions of the form

B1vB2, B1uB2v ⊥, R1vR2, R1uR2v ⊥.

AnABox, A, is a finite set of atoms of the formA_k(a_i) or P_k(a_i, a_j). The set of individual names in A is denoted by ind(A). T and A together form the knowledge base (KB)K= (T,A). The semantics forOWL 2 QLis defined in the usual way based on interpretations I = (∆^I,·^I) [5]. We write B₁ ≡ B₂ as a shortcut forB1vB2 andB2vB1.

For every role nameRin T, we take two fresh concept namesAR,A_R− and add toT the axiomsAR≡ ∃RandA_R−≡ ∃R⁻. We say that the resulting TBox is innormal formand assume, without loss of generality, that every TBox in this paper is in normal form. We denote byv_T the subsumption relation induced by T and writeS1v_T S2ifT |=S1vS2, whereS1,S2are both concepts or roles.

We say that an ABox AisH-complete with respect to T in case R2(a, b)∈ A if R1(a, b)∈ AandR1vT R2,

A2(a)∈ A if A1(a)∈ AandA1v_T A2,

for all concept namesAi (including theAR) and roles Ri. We writeR(a, b)∈ A forP(a, b)∈ AifR=P and forP(b, a) ifR=P⁻; also, we write AR(a)∈ Aif R(a, b)∈ A, for some b.

A conjunctive query (CQ) q(x) is a formula ∃yϕ(x,y), where ϕ is a con- junction of atoms of the formAk(z1) orPk(z1, z2) withzi∈x∪y(without loss of generality, we assume that CQs do not contain constants). A tuplea⊆ind(A) is acertain answer to q(x) over K= (T,A) ifI |=q(a) for all I |=K; in this case we writeK |=q(a). Ifx=∅, the CQq is called Boolean; a certain answer to such aq overK is ‘yes’ if K |=q and ‘no’ otherwise. Where convenient, we regard a CQ qas the set of its atoms.

SupposeT is a TBox andq(x) a CQ. An FO-formulaq⁰(x) with free variables xand without constants is anFO-rewriting ofqandT over H-complete ABoxes if, for any H-complete (with respect to T) ABox A and any a ⊆ ind(A), we have (T,A)|=q(a) iffA |=q⁰(a). If an FO-rewritingq⁰ is a positive existential formula (with only∃,∧,∨), we call it aPE-rewritingofqandT. We also consider

(4)

rewritings in the form of nonrecursive Datalog queries. We remind the reader that aDatalog program,Π, is a finite set of Horn clauses∀x(γ₁∧ · · · ∧γ_m→γ₀), where eachγ_i is an atom of the formP(x₁, . . . , x_l) withx_i∈x. The atomγ₀ is called the head of the clause, andγ1, . . . , γmits body. All variables in the head must also occur in the body. A predicate P depends on a predicate Qin Π if Π contains a clause whose head isP and whose body contains Q. Π is called nonrecursive if this dependence relation for Π is acyclic. For a nonrecursive Datalog programΠ and an atomq⁰(x), we say that (Π,q⁰) is anNDL-rewriting ofq(x)andT over H-complete ABoxes in case (T,A)|=q(a) iffΠ,A |=q⁰(a), for any H-complete ABox A and any a ⊆ ind(A). Rewritings over arbitrary ABoxes are defined by dropping the condition that the ABoxes are H-complete.

They can be obtained from rewritings over H-complete ABoxes at the price of a polynomial blowup.

Recall [7, 12] that, for any TBoxT and ABoxA, there is a canonical model CT,A of (T,A) such that (T,A) |= q(a) iff CT,A |= q(a), for all CQs q(x) and a⊆ind(A). The domain ofC_T_,A consists of the individuals inind(A) and the witnesses introduced by the existential quantifiers in T. Every individual a ∈ ind(A) with (T,A) |= A_R(a) is the root of a (possibly infinite) subtree C_T^R(a) of C_T_,A, which may intersect another such tree only on their common roota. EveryC_T^R(a) is isomorphic to the canonical model of (T,{AR(a)}).

a1:AR₁ a2:AR₁, AR₂ a3

C_T^R¹(a1) C^R_T¹(a2) C^R_T²(a2)

P1, P₂⁻

P₂

We say that T isof depth ω if at least one ofC_T^R(a) is infinite;T isof depth d, 0≤d < ω, if there is a chain of the formaR0w1. . . w_d−1R_d−1wd(for distinctwi) in the treesC_T^R(a),R a role inT, but there is no such chain of greater length.

By definition, C_T_,A |= q(a) iff there is a homomorphism h: q(a) → C_T_,A. Such a homomorphismhsplitsqinto the subquery mapped byhto ind(A) and the subquery mapped to the trees C_T^R(a). We can think of a rewriting ofq and T as listing possible splits ofqinto such subqueries.

Suppose q⁰ is a subset of the atoms of q and there is a homomorphism h: q⁰ → C^R_T(a), for somea, such thathmaps all answer variables inq⁰toa. Let tr=h⁻¹(a) and lettibe the remaining set of (existentially quantified) variables in q⁰. We call the pair t= (t_r,ti) atree witness for q and T generated by R if t_i6=∅ andq⁰ is aminimal subset ofq such that, for anyy∈t_i, every atom inq containingy belongs toq⁰. In this case, we denoteq⁰ byq_t. By definition,

q_t=

S(z)∈q|z⊆tr∪ti andz6⊆tr .

Note that the same tree witness t = (tr,ti) can be generated by different roles R. We denote the set of all such roles byΩt and define the formula

twt = _

R∈Ωt

∃z AR(z) ∧ ^

x∈t_r

(x=z)

. (1)

(5)

Tree witnessestandt⁰areconsistent ifq_t∩q_t0 =∅. Eachconsistent setΘof tree witnesses (in which any pair of distinct tree witnesses is consistent) determines a subqueryq_Θofqthat comprises all atoms ofq_t, fort∈Θ. The subqueryq_Θis to be mapped to theC_T^R(a), whereas the remainder,q\q_Θ, obtained by removing the atoms of q_Θ fromq, is mapped toind(A). The following PE-formulaq_tw is called thetree-witness rewriting ofqandT over H-complete ABoxes:

q_tw(x) = _

Θconsistent

∃y

(q\q_Θ) ∧ ^

t∈Θ

tw_t

. (2)

Example 1. Consider the KB K= (T,{A(a)}), where

T ={Av ∃R, Av ∃R⁻, AR≡ ∃R, A_R⁻≡ ∃R⁻},

and the CQ q(x₁, x₄) = {R(x₁, y₂), R(y₃, y₂), R(y₃, x₄)} shown in the picture below alongside the canonical modelC_T_,A(withA_R andA_R− omitted).

t¹

t² x1

y2

y3

x4

R R R

aA R⁻ R

CT,A

There are two tree witnesses for q and T: t¹ = (t¹_r,t¹_i) generated by R and t²= (t²_r,t²_i) generated byR⁻, with

t¹_r ={x₁, y₃}, t¹_i ={y₂}, tw_t1 =∃z(A_R(z)∧(x₁=z)∧(y₃=z)), t²_r ={y2, x4}, t²_i ={y3}, twt² =∃z(AR⁻(z)∧(x4=z)∧(y2=z)).

We haveq_t1 ={R(x₁, y₂), R(y₃, y₂)}and q_t2={R(y₃, y₂), R(y₃, x₄)}, sot¹ and t² are inconsistent. Thus, we obtain the following tree-witness rewriting:

q_tw(x₁, x₄) = ∃y₂, y₃

(R(x₁, y₂)∧R(y₃, y₂)∧R(y₃, x₄))∨

(R(y₃, x₄)∧tw_t1)∨(R(x₁, y₂)∧tw_t2) .

Theorem 1. For any ABox A that is H-complete with respect to T and any a⊆ind(A), we have C_T_,A|=q(a) iffA |=q_tw(a).

3 Hypergraph Functions

The tree-witness rewriting q_tw above gives rise to monotone Boolean functions we call hypergraph functions. For the complexity theory of monotone Boolean functions the reader is referred to [2, 10].

LetH = (V, E) be a hypergraph withvertices v∈V andhyperedges e∈E, E ⊆ 2^V. We call a subset X ⊆ E independent if e∩e⁰ = ∅, for any distinct

(6)

e, e⁰ ∈ X. The set of vertices that occur in the hyperedges of X is denoted by V_X. With each vertexv∈V and each hyperedgee∈Ewe associate propositional variablesp_v andp_e, respectively. Thehypergraph function f_H forH is given by the Boolean formula

f_H = _

Xindependent

^

v∈V\VX

p_v ∧ ^

e∈X

p_e

. (3)

The rewritingq_tw ofqandT defines a hypergraph whose vertices are the atoms ofqand hyperedges are the setsq_t, for tree witnessestforq andT. We denote this hypergraph by H_T^q. The formula (3) defining f_H^q

T is basically the same as the rewriting (2) with the atomsS(z) inqand tree-witness formulastw_ttreated as propositional variables. We denote these variables by p_S(z) and p_t (rather thanpv andpe), respectively.

Example 2. Consider again the CQq and TBoxT from Example 1. The hypergraph H_T^q and its hypergraph functionf_H^q

T are shown below:

R(x1, y2) R(y₃, y₂)

R(y3, x4)

t¹ t²

f_H^q

T = (p_R(x₁_,y₂₎∧p_R(y₃_,y₂₎∧p_R(y₃_,x₄₎)∨ (p_R(y₃_,x₄₎∧p_t1)∨(p_R(x₁_,y₂₎∧p_t2).

Suppose the functionf_H^q

T is computed by some Boolean formula χ_H^q

T. Con- sider the FO-formula χb_H^q

T obtained by replacing each pS(z) inχ_H^q

T withS(z), each pt with twt, and adding the prefix ∃y. By comparing (3) and (2), we see that the resulting FO-formula is a rewriting ofqandT over H-complete ABoxes.

Theorem 2. (i) If the function f_H^q

T is computed by a Boolean formula χ_H^q

T, thenχb_H^q

T is an FO-rewriting ofq andT over H-complete ABoxes.

(ii) If f_H^q

T is computed by a monotone Boolean circuit C, then there is an NDL-rewriting ofq andT over H-complete ABoxes of sizeO(|C| ·(|q|+|T |)).

Thus, the problem of constructing short rewritings is reducible to the problem of finding short Boolean formulas computing the hypergraph functions. Hyper- graphs of degree≤2, in which every vertex belongs to at most two hyperedges, are of particular interest to us because (i) TBoxes of depth 1 have hypergraphs of degree ≤2, and (ii) their hypergraph functions are the functions computed by nondeterministic branching programs (NBPs), also known as switching-and- rectifier networks [10] of polynomial size. Recall [17] that, in terms of the expres- sive power, NBPs sit between Boolean formulas and Boolean circuits. On the other hand, hypergraphs of unbounded degree are substantially more powerful than hypergraphs of degree 2; more precisely, polynomial-size hypergraphs of unbounded degree can defineNP-hard Boolean functions. Here is an example.

We remind the reader (see, e.g., [2] for details) that the monotone Boolean function Cliquen,k(e) of n(n−1)/2 variables ejj⁰, 1 ≤ j < j⁰ ≤ n, returns 1 iff the graph with vertices {1, . . . , n} and edges {{i, j} | ejj⁰ = 1} contains a k-clique. Clearly, Cliquen,k is NP-complete. A series of papers, started by

(7)

Razborov’s [16], gave an exponential lower bound for the size of monotone circuits computing Cliquen,k: 2^Ω(

√

k) for k ≤ ¹₄(n/logn)^2/3 [1]. For monotone formulas, an even better lower bound is known: 2^Ω(k) fork= 2n/3 [15].

Givennandkas above, let Hn,k be a hypergraph with the vertices {vi|1≤i≤k} ∪ {ujj⁰, wjj⁰, uj⁰j|1≤j < j⁰≤n}

and the hyperedges

– g^ij ={vi} ∪ {ujj⁰|1≤j⁰ ≤n, j⁰6=j}, for 1≤i≤k, 1≤j≤n, – h^jj⁰ ={w_jj⁰, u_jj⁰} andh^j⁰^j={w_jj⁰, u_j⁰_j}, for 1≤j < j⁰≤n.

Informally, the vertices v_i of H_n,k represent a k-clique in a given graph with n vertices. The vertices wjj⁰ represent the edges of the complete graph with n vertices; they can be turned ‘on’ or ‘off’ by means of the Boolean variables ejj⁰. The vertexujj⁰ together with the hyperedgeh^jj⁰ represent the ‘half’ of the edge connecting j and j⁰ that is adjacent toj. The hyperedges g^ij correspond to the choice of thejth vertex of the graph as theith element of the clique. For example, the hypergraphH4,2 looks as follows:

h¹²

h²¹

h²³ h³²

h³⁴ h⁴³

h¹⁴ h⁴¹

h²⁴

h⁴² h¹³

h³¹

g¹³ g¹⁴

g¹¹

g¹²

g²³ g²⁴

g²¹

g²²

v₁ v₂

1

2 3

4

w12

w23

w34

w14

w24 w13

u12

u₂₁

u23

u32

u₃₄ u43

u₄₁ u₁₄

u24

u42 u13

u31

Denote by v, u, w, g and hthe vectors of the vi, ujj⁰, wjj⁰, g^ij and h^jj⁰, respectively. Thus, we can think of the hypergraph function forHn,k as having the formfH_n,k(v,u,w,g,h).

Theorem 3. Cliquen,k(e) =f_H_n,k(0,1,e,1,1), for alle∈ {0,1}^n(n−1)/2.

4 The Size of Rewritings over TBoxes of Depth 1 and 2

We use these complexity-theoretic results on the hypergraph functions to investigate the size of rewritings over TBoxes of depth 1 and 2.

(8)

Theorem 4. For any CQ q and any TBox T of depth 1, there is an NDL- rewriting ofq andT of polynomial size.

The proof of this theorem is based on the following facts. First, for any CQ qand TBoxT of depth 1, the hypergraphH_T^q is of degree≤2, and the number of distinct tree witnesses for q and T does not exceed the number of variables in q. Second, each hypergraph function given by a hypergraph of degree ≤ 2 is computed by a polynomial-size monotone NBP, and so by a polynomial size monotone Boolean circuit. Thus, it remains to use Theorem 2.

Theorem 5. There is a sequence of CQsq_nand TBoxesTn of depth 1 such that any PE-rewriting ofq_n andTn (over H-complete ABoxes) is of size2^Ω(log²ⁿ⁾.

In the proof of this result, we use a sequence of Boolean functions fn(x) that are computable by polynomial-size monotone NBPs, but any monotone Boolean formulas computing fn are of size ≥ 2^Ω(log²ⁿ⁾. (For example, Grigni and Sipser [9] consider fn(x) that takes the adjacency matrix of a directed graph ofnvertices with a distinguished vertexsas input and returns 1 iff there is a directed path from s to some vertex of outdegree at least two.) Since any NBP corresponds to a hypergraph of degree≤2 of a polynomial size, we obtain a sequence H_n of polynomial hypergraphs of degree≤2 such thatf_n=f_H_n.

Now, we call a hypergraphH representableif there are a CQq_H and a TBox T_H such thatH is isomorphic to H_T^q^H

H. To show that any hypergraph of degree

≤2 is representable, take any suchH = (V, E) and assume, to simplify notation, that any vertex inHbelongs to exactly 2 hyperedges, soHcomes with two fixed maps i1, i2: V → E such that i1(v) 6= i2(v), v ∈ i1(v) and v ∈ i2(v), for any v ∈V. For every e∈E, we take an individual variableze and denote byz the vector of all such variables. For everyv∈V, we take a role nameRv and set:

q_H = ∃z ^

v∈V

R_v(z_i₁_(v), z_i₂_(v)),

For every hyperedge e ∈ E, let Ae be a concept name and Se a role name.

Consider the TBox TH with the following inclusions, fore∈E:

A_e≡ ∃Se,

S_evR⁻_v, forv∈V withi₁(v) =e, SevRv, forv∈V withi2(v) =e.

We claim thatH is isomorphic toH_T^q^H

H and illustrate the proof by an example.

Example 3. LetH = (V, E) withV ={v1, v2, v3, v4}andE={e1, e2, e3}, where e1={v1, v2, v3},e2={v3, v4},e3={v1, v2, v4}. Suppose also that

i₁:v₁7→e₁, v₂7→e₃, v₃7→e₁, v₄7→e₂, i2:v17→e3, v27→e1, v37→e2, v47→e3.

The hypergraph H is shown below, where each vertex vi is represented by an edge: i1(vi) is indicated by the circle-shaped end of the edge and i2(vi) by the diamond-shaped one; the hyperedgesej are shown as large grey squares:

(9)

e

2

e

₁

e

3

v₁ v2

v₄ v₃ hyp

ergraph H

tree witnesst^e¹ z_e₁

ze₂, ze₃

Ae₁

S_e₁ R− v3 Rv2 R− v1

q_H =∃ze₁ze₂ze₃ Rv₁(ze₁, ze₃)∧Rv₂(ze₃, ze₁)∧Rv₃(ze₁, ze₂)∧Rv₄(ze₂, ze₃) and the TBoxTH contains the following inclusions:

Ae₁≡ ∃Se₁, Se₁vR_v⁻₁, Se₁vRv₂, Se₁vR⁻_v₃, Ae₂≡ ∃Se₂, Se₂vRv₃, Se₂vR_v⁻₄,

Ae₃≡ ∃Se₃, Se₃vRv₁, Se₃vR_v⁻₂, Se₃vRv₄.

The canonical modelC_T^S^e¹

H (a) is shown on the right-hand side of the picture above.

We observe now that each variable z_e uniquely determines the tree witnesst^e withq_te ={Rv(z_i₁_(v), z_i₂_(v))|v∈e};q_te andq_te0 are consistent iffe∩e⁰ 6=∅. It follows that H is isomorphic toH_T^q^H

H.

We take the sequence of CQsq_n and TBoxesTn associated with the hypergraphs of degree ≤ 2 for the sequence fn of Boolean functions chosen above.

Then we show that a PE-rewritingq⁰_n of q_n andTn can be transformed (using quantifier elimination over single-individual ABoxes [13]) into monotone Boolean formulas χ_n computing f_n and having size ≤ |q⁰_n|. It remains to recall that

|q⁰_n| ≥ |χn| ≥2^Ω(log²ⁿ⁾.

Theorem 6. There exists a sequence of CQsq_nand TBoxesT_n of depth 2 such that any PE- and NDL-rewriting of q_n and Tn is of exponential size, while any F O-rewriting ofq_n andTn is of superpolynomial size (unlessNP⊆P/poly).

To prove this theorem, we show that the hypergraphsHn,kcomputingCliquen,k

are representable as subgraphs ofH_T^q^n,k

n,k for suitableq_n,k andT_n,k, and then use quantifier elimination over single-individual ABoxes as above.

The CQ q_n,k and the TBox Tn,k of polynomial size (in n) that ‘compute’

Cliquen,k are defined as follows. The Boolean CQq_n,k contains the atoms:

Tij(vi, zij), for 1≤i≤kand 1≤j≤n, P_jj⁰(w_jj⁰, x_jj⁰), P_j⁰_j(w_jj⁰, x_j⁰_j), for 1≤j < j⁰≤n,

Q(ujj⁰, xjj⁰), U(ujj⁰, zij), for 1≤j6=j⁰≤nand 1≤i≤k.

The picture below illustrates the fragments ofq_n,k centred aroundzij andxjj⁰:

(10)

v

i

zij

u

_jj⁰

forj⁰6=j

x_jj⁰ zij⁰

forj⁰6=j

zi⁰j fori⁰6=i

T^ij

U

QU T^ij

0

C_T^R^ij

n,k(a) cij

c⁰_ij

RijR0 ij

w

_jj⁰

xjj⁰

u

_jj⁰

xj⁰j zij for alli

P^j

0j

P^j

0j

Q

U

C_T^S^jj⁰

n,k(a) djj⁰

d⁰_jj0

Sjj0S0 jj0

The TBoxTn,k mimics the arrangement of atoms in the layers depicted above and contains the following inclusions: for 1≤i≤k and 1≤j6=j⁰≤n,

Aij≡ ∃Rij, RijvT_ij⁻0, Rij vU⁻, Rij vQ⁻, ∃R_ij⁻vA⁰_ij A⁰_ij≡ ∃R⁰_ij, R⁰_ijvTij, R⁰_ij vU,

B_jj⁰ ≡ ∃S_jj⁰, S_jj⁰ vP_j⁻0j, S_jj⁰ vU⁻, ∃S_jj⁻0 vB_jj⁰ 0, B_jj⁰ 0 ≡ ∃S_jj⁰ 0, S_jj⁰ 0 vPjj⁰, S⁰_jj0 vQ.

The picture above also shows the elements and ‘generating roles’ of the models C_T^R^ij

n,k(a) and C_T^S^jj⁰

n,k(a). The horizontal dashed lines show possible ways of embedding the fragments of q_n,k into the respective canonical models. These embeddings give rise to the following tree witnesses forq_n,k andTn,k:

– tîj= (tîj_r ,tîj_i ) generated byR_ij, for 1≤i≤k and 1≤j≤n, where tîj_r = {zij⁰, xjj⁰ |1≤j⁰≤n, j⁰6=j} ∪ {zi⁰j |1≤i⁰≤k, i6=i⁰}, tîj_i = {vi, zij} ∪ {ujj⁰ |1≤j⁰≤n, j⁰ 6=j};

– s^jj⁰ = (s^jj_r ⁰,s^jj_i ⁰) ands^j⁰^j = (s^j_r⁰^j,s^j_i⁰^j), generated bySjj⁰ andSj⁰j, where s^jj_r ⁰ ={x_j⁰_j} ∪ {z_ij|1≤i≤k}, s^j_r⁰^j={x_jj⁰} ∪ {z_ij⁰ |1≤i≤k}, s^jj_i ⁰ ={wjj⁰, u_jj⁰, x_jj⁰}, s^j_i⁰^j={wjj⁰, u_j⁰_j, x_j⁰_j}.

The tree witnesses t^ij, s^jj⁰ and s^j⁰^j are uniquely determined by their most remote (from the root) variable,zij,xjj⁰ andxj⁰j, respectively, and correspond to the hyperedgesf^ij,h^jj⁰,h^j⁰^j of the hypergraphHn,k; their internal variables of the formvi,wjj⁰ andujj⁰ correspond to the vertices in the respective hyperedge.

5 Tree-Shaped Queries and TBoxes of Depth 1

In this section, we consider one particular equivalent transformation of the tree- witness rewriting. Let q be a CQ and z a subset of the set of existentially quantified variables inq. By az-partition ofq we understand any disjoint sets of atoms q₁, . . . ,q_k, calledz-components, that coverq and are such that if, for

(11)

i= 1,2, each ofS_i(z_i)∈q contains a variablez_i ∈z and z₁ is connected toz₂ by a path coming through variables inz only, then bothS_i(z_i) are in the same component. Note that, for anyz-partition ofq, every tree witness for∃z qand anyT is contained in somez-component of the partition.

Given a TBoxT and a CQq(x) =∃yϕ(x,y), we define a CQq^†(x) =∃y q^y, whereq^z is computed recursively as follows: we take az-partitionq₁, . . . ,q_k of q, takezjto be the set of the variables inzthat occur inq_j, for each 1≤j≤k, (z1, . . . ,zk form a partition ofz) and setq^z=q^∗z₁ ¹∧ · · · ∧q^∗z_k ^k, where

q^∗z_j ^j =







(q_j)^z^j^\{z^j^} ∨ _

tree witnesstfor∃zjq_j andT withz_j∈ti

twt∧(q_j\q_t)^z^j^\(t^r^∪tⁱ⁾

, if there iszj ∈zj,

q_j, otherwise.

Note that q^z depends on the choice of q₁, . . . ,q_k and zj ∈ zj, which can be arbitrary. Intuitively, the first disjunct of q^z reflects the situation wherez_j is mapped to an ABox element; so we treatz_j as a free variable when rewritingq_j. The other disjuncts reflect the case when z_j is mapped to the non-ABox part of the canonical model, in which casez_j belongs to the internal partt_i of a tree witness t = (t_r,t_i) for ∃z_jq_j andT. As the variables in t_r must be mapped to ABox elements, this leaves the setq_j\q_t of atoms with existentially quantified zj\(tr∪ti) for further rewriting (this set of variables does not containzj).

Theorem 7. For any ABox A that is H-complete with respect to T and any a⊆ind(A), we have C_T_,A|=q(a) iffA |=q^†(a).

Example 4. Take q and T from Example 1. The only {y2, y3}-component of q is q. Then we pick, say y2, and obtain q^{y²^,y³^} = q^{y³^}∨ tw_t1 ∧R(y3, x4)^∅

. Now, q has two {y3}-components, {R(x1, y2)} andq₁ ={R(y3, y2), R(y3, x4)}.

The former gives R(x1, y2), while in the latter we have to pick y3 and obtain q^∅₁∨twt², assuming that the empty set of atoms is>. This gives the rewriting:

q^†(x₁, x₄) =∃y2, y₃h

R(x₁, y₂)∧ R(y₃, y₂)∧R(y₃, x₄)

∨tw_t2

∨ tw_t1∧R(y₃, x₄)i

. A CQqis said to be tree-shaped if its primal graph (whose vertices are the variables inqand edges are pairs{t, t⁰}such thatR(t, t⁰)∈q) is a tree. In each componentq_j of a tree-shaped CQsq, we can choose a variablezj that splits it in half. More formally, we have the following:

Proposition 1. For any treeT = (V, E), there is a vertexv∈V such that each connected component obtained by removingv from T contains≤ |V|/2vertices.

By Proposition 1, any tree-shaped CQ can be split into components, each of which contains less than a half of the existentially quantified variables of the CQ. By applying this argument recursively and using the fact that, if a TBoxT is of depth 1 then, for any variablezinq, the number of tree witnessest= (tr,ti) forq andT withz∈tidoes not exceed 2, we obtain our final result:

Theorem 8. Any tree-shaped CQ over any TBox of depth 1 has a polynomial PE-rewriting.

(12)

References

1. N. Alon and R. Boppana. The monotone circuit complexity of Boolean functions.

Combinatorica, 7(1):1–22, 1987.

2. S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cam- bridge University Press, New York, NY, USA, 1st edition, 2009.

3. A. Artale, D. Calvanese, R. Kontchakov, and M. Zakharyaschev. The DL-Lite family and relations. J. Artif. Intell. Res.(JAIR), 36:1–69, 2009.

4. J. Avigad. Eliminating definitions and Skolem functions in first-order logic. In Proc. of 16th Annual IEEE Symposium on Logic in Computer Science (LICS), pages 139–146, 2001.

5. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, edi- tors. The Description Logic Handbook: Theory, Implementation and Applications.

Cambridge University Press, 2003.

6. E. B¨orger, E. Gr¨adel, and Y. Gurevich.The Classical Decision Problem. Perspec- tives in Mathematical Logic. Springer, 1997.

7. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Lite family.

J. of Automated Reasoning, 39(3):385–429, 2007.

8. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog programs. InProc. of the 24th Int. Workshop on Description Logics (DL 2011), vol. 745 ofCEUR Workshop Proceedings. CEUR-WS.org, 2011.

9. M. Grigni and M. Sipser. Monotone separation of logarithmic space from logarithmic depth. J. Comput. Syst. Sci., 50(3):433–437, 1995.

10. S. Jukna. Boolean Function Complexity — Advances and Frontiers, volume 27 of Algorithms and Combinatorics. Springer, 2012.

11. S. Kikot, R. Kontchakov, and M. Zakharyaschev. On (In)Tractability of OBDA with OWL 2 QL. In Proc. of the 24th Int. Workshop on Description Logics (DL 2011), vol. 745 ofCEUR Workshop Proceedings. CEUR-WS.org, 2011.

12. S. Kikot, R. Kontchakov, and M. Zakharyaschev. Conjunctive query answering with OWL 2 QL. In Proc. of the 13th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR 2012). AAAI Press, 2012.

13. S. Kikot, R. Kontchakov, V. Podolskii, and M. Zakharyaschev. Exponential lower bounds and separation for query rewriting. In Proc. of the 39th Int. Colloquium on Automata, Languages, and Programming (ICALP 2012), vol. 7392 of LNCS, pages 263–274. Springer, 2012.

14. S. Kikot, R. Kontchakov, V. Podolskii, and M. Zakharyaschev. Long rewritings, short rewritings. In Proc. of the 2012 Int. Workshop on Description Logics (DL 2012), vol. 846 ofCEUR Workshop Proceedings. CEUR-WS.org, 2012.

15. R. Raz and A. Wigderson. Monotone circuits for matching require linear depth.

J. ACM, 39(3):736–744, 1992.

16. A. Razborov. Lower bounds for the monotone complexity of some Boolean functions. Dokl. Akad. Nauk SSSR, 281(4):798–801, 1985.

17. A. Razborov. Lower bounds for deterministic and nondeterministic branching programs. In Proc. of the 8th Int. Symposium on Fundamentals of Computation Theory (FCT’91), vol. 529 of LNCS, pages 47–60. Springer, 1991.