Succinctness of Query Rewriting in OWL 2 QL: The Case of Tree-like Queries

(1)

Succinctness of Query Rewriting in OWL 2 QL:

The Case of Tree-like Queries

Meghyn Bienvenu¹, Stanislav Kikot², and Vladimir Podolskii³

1 LRI – CNRS & Universit´e Paris Sud, Orsay, France

2 Institute for Information Transmission Problems & MIPT, Moscow, Russia

3 Steklov Mathematical Institute, Moscow, Russia

Abstract. This paper further investigates the succinctness landscape of query rewriting in OWL 2 QL. We clarify the worst-case size of positive existential (PE), non-recursive Datalog (NDL), and first-order (FO) rewritings for various classes of tree-like conjunctive queries, ranging from linear queries up to bounded treewidth queries. More specifically, we establish a superpolynomial lower bound on the size of PE-rewritings that holds already for linear queries and TBoxes of depth 2. For NDL-rewritings, we show that polynomial-size rewritings always exist for tree-shaped queries with a bounded number of leaves (and arbitrary TBoxes), and for bounded treewidth queries and bounded depth TBoxes. Finally, we show that the succinctness problems concerning FO-rewritings are equivalent to well-known problems in Boolean circuit complexity. Along with known results, this yields a complete picture of the succinctness landscape for the considered classes of queries and TBoxes.

1 Introduction

For several years now, conjunctive query (CQ) answering has been a major focus of description logic (DL) research (cf. survey [19]), due to the growing interest in using description logic ontologies to query data. Formally, the problem is to compute the certain answersto a CQq(x)over a knowledge base(T,A), that is, the tuples of in- dividualsathat satisfyT,A |=q(a). Much of the work on CQ answering focuses on lightweight DLs of the DL-Lite family [5], and the corresponding OWL 2 QL profile [18]. The popularity of these languages is due to fact that they enjoyfirst-order (FO) rewritability, which means that for every CQq(x)and every TBoxT, there exists a computable FO-queryq⁰(x)(called arewriting) such that the certain answers toq(x) over(T,A)coincide with the answers of the FO-queryq⁰(x)over the ABoxA(viewed as a database). First-order rewritability provides a means of reducing CQ answering to the evaluation of FO (∼ SQL) queries in relational databases. A great many different query rewriting algorithms have been proposed for OWL 2 QL and its extensions, cf. [5, 20, 25, 6, 10, 24, 21, 7, 17, 23]. Most of these algorithms produce rewritings ex- pressed as unions of conjunctive queries (UCQs), and the size of such rewritings can be huge, making it difficult, or even impossible, to evaluate them using standard relational database management systems.

It is not difficult to see that exponential-size rewritings are unavoidable if rewritings are given as UCQs (consider for instance the CQq(x) =B1(x)∧. . .∧Bn(x)and TBox {AivBi|1≤i≤n}). A natural (and non-trivial) question is whether an exponential

(2)

blowup can be avoided by moving to other standard query languages, like positive existential (PE) queries, non-recursive datalog (NDL) queries, or first-order (FO-) queries⁴. More generally, under what conditions can we ensure polynomial-size rewritings? A first (negative) answer was given in [14], which proved exponential lower bounds for the worst-case size of PE- and NDL-rewritings, as well as a superpolynomial lower bound for FO-rewritings (under the widely-held assumption that NP6⊆P/poly). Interestingly, all three results hold already fortree-shapedCQs, which are a well-studied class of CQs that often enjoy better computational properties, cf. [28, 4]. While the queries used in the proofs had a simple structure, the TBoxes induced full binary trees of depthn. This raised the question of whether better results could be obtained by considering restricted classes of TBoxes. A recent study [15] explored this question for TBoxes ofdepth1 and 2, that is, TBoxes that generate canonical models whose elements are at most 1 or 2 ‘steps away’ from the ABox (see Section 2 for a formal definition). It was shown that for depth 1 TBoxes, polysize PE-rewritings do not exist, polysize NDL-rewritings do exist, and polysize FO-rewritings exist iff NL/poly ⊆ NC¹. For depth 2 TBoxes, neither polysize PE- nor NDL-rewritings exist, and polysize FO-rewritings do not exist unless NP6⊆P/poly. These results used simpler TBoxes, but the considered CQs were no longer tree-shaped. For depth 1 TBoxes, this distinction is crucial, as it was further shown in [15] that polysize PE-rewritingsdoexist for tree-shaped CQs.

While existing results go a fair ways towards understanding the succinctness landscape of query rewriting in OWL 2 QL, a number of questions remain open:

– What happens if we consider tree-shaped queries and bounded depth TBoxes?

– What happens if we consider generalizations or restrictions of tree-shaped CQs?

In this paper, we address these questions by providing a complete picture of the succinctness of rewritings for tree-shaped queries, their restriction tolinear and bounded branching queries(i.e. tree-shaped CQs with a bounded number of leaves), and their generalization to bounded treewidth queries. More specifically, we establish a superpolynomial lower bound on the size of PE-rewritings that holds already for linear queries and TBoxes of depth 2. For NDL-rewritings, we show that polynomial-size rewritings always exist for bounded branching queries (and arbitrary TBoxes), and for bounded treewidth queries and bounded depth TBoxes. Finally, we show that the succinctness problems concerning FO-rewritings are equivalent to well-known problems in Boolean circuit complexity: NL/poly ⊆ NC¹ in the case of linear and bounded branching queries, and SAC¹⊆NC¹in the case of tree-shaped and bounded treewidth queries and bounded depth TBoxes. Along with known results, this yields a complete picture of the succinctness landscape for the considered classes of queries and TBoxes.

To prove our results, we establish tight connections between Boolean functions induced by queries and TBoxes and the non-uniform complexity classes NL/poly and SAC¹, reusing and further extending the machinery developed in [14, 15].

Many proofs have been omitted for lack of space. We invite the interested reader to consult the long version [3] for full proofs and additional material.

4We focus on so-calledpureFO-rewritings, cf. for [8, 11] discussion and related results.

(3)

2 Preliminaries

OWL 2 QLIn this paper, we use the simplified DL syntax of the OWL 2 QL profile [18]. As usual, we assume countably infinite, mutually disjoint setsNC,NR, andNIof concept names,role names, andindividual names.RolesR andbasic conceptsB are defined by the grammar:

R ::= r | r⁻ B ::= A | ∃R

whereA∈NCandr∈NR. We useN^±_R to refer to the set of all roles.

ATBox(typically denotedT) is a finite set ofinclusionsof the forms B1vB2 B1v ¬B2 R1vR2 R1v ¬R2

Thesignature of a TBoxT, writtensig(T), is the set of concept and role names that appear inT. AnABox(typically denotedA) is a finite set ofassertionsthe formA(a) orr(a, b), whereA∈NC,r∈NR, anda, b∈NI. The set of individual names inAis denotedInds(A).

A TBoxT and ABoxAtogether form aknowledge base(KB)K = (T,A). The semantics of KBs is defined in the usual way based on interpretationsI= (∆^I,·^I)[2].

We usev_T to denote the subsumption relation induced byT and writeP1 v_T P2 if T |=P1vP2, whereP1,P2are both concepts or roles.

Query answering and rewritingAconjunctive query (CQ)q(x)is an FO-formula

∃yϕ(x,y), whereϕis a conjunction of atoms of the form A(z1)or r(z1, z2)with zi ∈ x∪y. The free variablesxare called answer variables. Note that we assume w.l.o.g. that CQs do not contain individual names, and where convenient, we regard a CQ as the set of its atoms. We usevars(q)(resp.avars(q)) to denote the set of variables (resp. answer variables) ofq. Thesignatureofq, denotedsig(q), is the set of concept and role names inq. To every CQq, we associate the undirected graphGq whose vertices are the variables ofq, and which contains an edge{u, v}wheneverqcontains an atomr(u, v)orr(v, u). We call a CQqtree-shapedif the graphGqis a tree⁵.

A tuplea ⊆ Inds(A)is acertain answertoq(x)overK = (T,A)ifI |=q(a) for allI |= K; in this case we writeK |= q(a). By first-order semantics,I |= q(a) iff there is a mappingh:vars(q)→∆Î such that (i)h(z)∈AÎwheneverA(z)∈q, (ii)(h(z), h(z⁰))∈ rÎwheneverr(z, z⁰)∈ q, and (iii)hmapsavars(q)toaÎ. If the first two conditions are satisified, thenhis ahomomorphismfromqtoI, and we write h: q→ I. If (iii) also holds, then we writeh: q(a)→ I.

To every ABoxA, we associate the interpretationI_Awhose domain isInds(A)and whose interpretation function makes true precisely the assertions fromA. We say that an FO-formulaq⁰(x)with free variablesxand without constants is anFO-rewriting of CQq(x)and TBoxT if, for any ABoxAand anya⊆Inds(A), we haveT,A |=q(a) iffI_A |=q⁰(a). Ifq⁰ is a positive existential formula (i.e. it only uses∃,∧,∨), then it is called a PE-rewriting of q and T. We also consider rewritings in the form of nonrecursive Datalog queries. We remind the reader that aDatalog program(typically denotedΠ) is a finite set of rules∀x(γ1∧ · · · ∧γm→γ0), where eachγiis an atom

5Tree-shaped conjunctive queries also go by the name ofacyclic queries, cf. [28, 4]

(4)

of the formP(x1, . . . , xl)withxi∈x. The atomγ0is called theheadof the rule, and γ1, . . . , γmitsbody. All variables in the head must also occur in the body. A predicate P dependson a predicateQin programΠ ifΠ contains a rule whose head predicate isPand whose body containsQ. The programΠis callednonrecursiveif there are no cycles in the dependence relation forΠ. For a nonrecursive Datalog programΠ and an atomgoal(x), we say that(Π,goal)is anNDL-rewriting ofq(x)and T in case T,A |=q(a)iffΠ,A |=goal(a), for any ABoxAand anya⊆Inds(A).

ForR ∈ {PE, NDL, FO}, we say that queries fromQand TBoxes fromThave polysizeR-rewritingsif there exists a polynomialpsuch that everyq∈QandT ∈T has aR-rewritingq⁰with|q⁰| ≤p(|q|+|T |).

Canonical modelWe recall that every consistent OWL 2 QL KB(T,A)possesses a canonical modelC_T,Awith the property thatT,A |=q(a)iffC_T,A|=q(a), for every CQqand tuplea ⊆ Inds(A). The domain ofC_T,A consists of all individual names fromAand all sequencesaR1R2. . . Rn(n≥1) such that

– T,A |=∃R1(a);

– for every1≤i < n:T |=∃R⁻_i v ∃Ri+1andT 6|=R⁻_i vRi+1. Concept and role names are interpreted as follows:

A^C^T^,A ={a∈Inds(A)| T,A |=A(a)} ∪ {wR∈∆^C^T^,A | T |=∃R⁻vA}

r^C^T^,A ={(a, b)|r(a, b)∈ A} ∪ {(w, wS)∈∆^C^T^,A ×∆^C^T^,A | T |=Svr} ∪ {(wS, w)∈∆^C^T^,A×∆^C^T^,A | T |=Svr⁻}

Every individual namea∈Inds(A)is interpreted as itself:a^C^T^,A =a.

We say that a TBoxT isof depthωif there is an ABoxAsuch thatC_T_,_Ahas an infinite domain;T isof depthd,0≤d < ω, ifdis the greatest number such that some C_T,Acontains an element of the formaR1. . . Rd. The depth ofT can be computed in polynomial time, and ifT is of finite depth, then its depth cannot exceed2|T |.

3 Boolean Functions as a Tool for Studying Rewritings

In this section, we introduce different representations of Boolean functions that will play an important role in our results. We assume that the reader is familiar with Boolean circuits [1, 12], built usingAND,OR, andNOTgates. Thesize of a circuitC, denoted

|C|, is defined as its number of gates. We will be particularly interested inmonotone circuits (that is, circuits with no NOTgates). (Monotone) formulas are (monotone) circuits whose underlying graph is a tree.

Non-deterministic branching programs(NBP) are another well-known model for the representation of Boolean functions [22, 12]. An NBP is defined as a tuple P = (V, E, s, t,l), where(V, E)is an directed graph,s, t∈V, andlis a function that labels every edgee∈Ewith a conjunction of propositional literals. The NBPP induces the functionfP defined as follows: for every valuationαof the propositional variables in P,fP(α) = 1if and only if there is a path fromstotin the graph(V, E)such that all labels along the path evaluate to 1 underα.

(5)

3.1 Hypergraph functions and hypergraph programs

We recall hypergraph functions and programs from [15]. LetH = (V, E)be a hypergraph with verticesv ∈ V andhyperedges e ∈ E,E ⊆ 2^V. A subsetE⁰ ⊆ E is independentif e∩e⁰ = ∅, for any distincte, e⁰ ∈ E⁰. With each vertexv ∈ V and each hyperedgee ∈ E, we associate propositional variablespv andpe, respectively.

Thehypergraph functionfHforHis given by the Boolean formula

fH = _

ind.E⁰⊆E

^

v∈V\∪E⁰

pv ∧ ^

e∈E⁰

pe

. (1)

Ahypergraph programHGPP consists of a hypergraphHP = (V, E)and a function lP that labels every vertex with0,1,pior¬pi (here thepiare propositional variables, distinct from thepv, peabove). An input forPis a valuation of the propositional variables inP’s labels. We say that the hypergraph programPcomputesa Boolean function f in case, for any inputα, we havef(α) = 1if and only if there is an independent subset ofEthatcovers all zeros—that is, contains all the vertices inV labelled with 0 underα. A hypergraph program ismonotoneif there are no negated variables among its vertex labels. Thesize,|H|, of a hypergraph programHis the number of its vertices and hyperedges. Observe that each hypergraph program that is based upon the hypergraph Hcomputes the Boolean function that is obtained from the hypergraph functionfHby substituting vertex labels for vertex variables and 1 for edge variables. Conversely, it is not hard to construct for a given hypergraphH, a hypergraph program that computes fH. Sometimes it is convenient to consider hypergraph programs whose labels arecon- junctionsof variables and their negations, rather than single literals. It is not hard to see that this does not change the power of such programs.

3.2 Upper bounds via tree witness hypergraph functions

The upper bounds in [15] rely on associating a hypergraph function with every query and TBox. As the hypergraph is defined in terms of tree witnesses, we first recall the definition of tree witnesses. Consider a CQqand a TBoxT. For every roleR, we let TR = T ∪ {AR v ∃R} andAR = {AR(a)} (for some fresh concept nameAR).

Suppose that q⁰ ⊆ q (recall that we view queries as sets of atoms) and there is a homomorphismh:q⁰ → C_T_R_,_A_R such that h(x) = afor every x ∈ avars(q). Let t_r={x∈vars(q⁰)|h(x) =a}, and lett_ibe the remaining set of (quantified) variables inq⁰. We call the pairt = (t_r,t_i)atree witness forqandT generated byRift_i 6=∅ andq⁰is aminimalsubset ofqsuch that, for anyy∈ti, every atom inqcontainingy belongs toq⁰. In this case, we denoteq⁰byqt. Note that the same tree witness can be generated by different rolesR. We letΘ^q_T be the set of all tree witnesses ofqandT and useΘ^q_T[R]to denote those generated byR. We usex∈tas a shorthand forx∈tr∪ti. To every CQqand TBoxT, we can naturally associate the hypergraph whose vertices are the atoms ofqand whose hyperedges are the setsqt, for tree witnessestforq andT. We denote this hypergraph byH_T^q and call the corresponding functionfH_T^q the tree witness hypergraph function ofqandT. It is known that the circuit complexity of fH_T^q provides an upper bound on the size of rewritings ofqandT.

Theorem 1 (from [15]).IffH_T^q is computed by a(monotone)Boolean formulaχthen there is a(PE-)FO-rewriting ofqandT of sizeO(|χ| · |q| · |T |).

(6)

Iff_H^q

T is computed by a monotone Boolean circuitCthen there is an NDL-rewriting ofqandT of sizeO(|C| · |q| · |T |).

Observe thatfH^q_T contains a variablept for every tree witnesst. For this reason, it can only be used to show polynomial upper bounds in cases where|Θ^q_T|is bounded polynomially in|q|and|T |. This motivates us to consider a variant offH_T^q:

f_H⁰ ^q

T = _

Θ⊆Θ_T^q independent

^

%∈q\qΘ

p%∧^

t∈Θ

^

z,z⁰∈t

pz=z⁰ ∧ _

R∈N^±_R, t∈Θ_T^q[R]

^

z∈t

p^R_z

(2)

whereqΘ = S

tqt. Intuitively,pz=z⁰ enforces that variables zandz⁰ are mapped to elements ofC_T,Athat begin by the same ABox individual; the variablep^Rz states thatz is mapped to an element that begins by an individualasatisfyingT,A |=∃R(a).

We observe that the number of variables inf_H⁰ ^q

T is polynomially bounded in|q|and

|T |, butf_H⁰ q T

retains the same properties asfH_T^q regarding upper bounds.

Theorem 2. Theorem 1 continues to hold iffH_T^q is replaced byf_H⁰ q T. 3.3 Lower bounds via primitive evaluation functions

In order to obtain lower bounds on the size of rewritings, it will prove convenient to associate to each pair(q,T)a third functionf_q,^P_T that describes the result of evaluating qon single-individual ABoxes. Given Boolean vectorsα:NC∩(sig(T)∪sig(q))→ {0,1}andβ:NR∩(sig(T)∪sig(q))→ {0,1}, we let

A(α,β) ={A(a)|α(A) = 1} ∪ {r(a, a)|β(r) = 1}

and setf_q,^P_T(α,β) = 1iffT,A(α,β)|=q(a), whereais a tuple ofa’s of the required length. We callf_q,^P_T theprimitive evaluation functionforqandT.

Theorem 3 (implicit in [15]).Ifq⁰is a(PE-)FO-rewriting ofqandT, then there is a (monotone)Boolean formulaχof sizeO(|q⁰|)which computesf_q,^P_T.

If(Π, G)is an NDL-rewriting ofqandT, thenf_q,^P_T is computed by a monotone Boolean circuitCof sizeO(|Π|).

4 Bounded Branching Queries

It is known from [14] that tree-shaped CQs do not have polysize PE- or NDL-rewritings (nor polysize FO-rewritings, unless NP ⊆P/poly). In this section, we investigate the robustness of these results by considering a restricted form of tree-shaped queries. We will say that a tree-shaped CQqhaskleavesif the associated graphGq (defined in Section 2) contains exactlykvertices of degree 1. We will be interested inbounded branching queries (that is, tree-shaped CQs with a bounded number of leaves) and linear queries(having exactly 2 leaves).

(7)

4.1 Bounded branching queries, interval hypergraphs and NBPs

It is not hard to see that every linear query induces a tree witness hypergraph that is isomorphic to aninterval hypergraph, i.e. a hypergraphH = (V, E)whereV ={[i, i+

1] | 1 ≤ i < n} for some finitenandE is a set of intervals of the form[i, j] = {[k, k+ 1] | 1 ≤k < j}where1 ≤i < j ≤n. Since the hypergraph functions of interval hypergraphs can be computed by polynomial-size NBPs, the same holds for the tree witness hypergraph functions of linear queries. In fact, the following result shows that we can construct polysize NBPs not only for linear queries, but also for bounded branching queries:

Theorem 4. Fix a constant` >1. Then there exists a polynomialpsuch that for every tree-shaped CQqwith at most`leaves and every OWL 2 QL TBoxT, there is an NBP of size at mostp(|q|+|T |)that computesfH_T^q.

We next give a polynomial translation from NBPs into interval hypergraph programs, thereby establishing the polynomial equivalence of these formalisms:

Theorem 5. Every functionf that is computable by an NBPPis also computable by an interval hypergraph program of size polynomial in|P|.

To complete the chain, we show how to compute hypergraph functions for interval hypergraphs using primitive evaluation functions of linear CQs and TBoxes of depth 2.

To every interval hypergraphH, we associate the linear CQq_Hpictured below:

q_H = ∃y ^

[i,i+1]∈V

(ri(yi, y⁰i)∧r⁰i(yi⁰, yi+1)).

y1 r1 y⁰1 r1⁰ y2 r2 y⁰2 r⁰₂ y3 r3 y3⁰ r⁰₃ y4 r4 y⁰4 r₄⁰ y5

The TBoxTHcontains the following axioms (on the left) for each edge[i, j]∈E:

Bij v ∃sij, ∃s⁻_ij v ∃s⁰_ij, sijvri, s⁻_ijvr⁰_j,

s⁰_ijvr⁰_k, for everyi≤k < j s⁰_ijvr⁻_k, for everyi < k≤j

B13

s13

s⁰₁₃

r1

r₁⁰ r2 r⁰₂ r3

r₃⁰

To the right, we illustrate the canonical model generated by B13. Observe how the axioms ensure that the subquery ofqHlying betweeny1andy4can be mapped onto it.

Theorem 6. For every interval hypergraphH = (V, E)and for allα : V → {0,1}

andβ :E → {0,1}we havefH(α, β) = 1ifff_q^P_H_,_T_H(γ)withγ defined as follows:

γ(Bij) =β([i, j]),γ(ri) =γ(r_i⁰) =α([i, i+ 1]), andγ(sij) =γ(s⁰_ij) = 0.

4.2 Size of Rewritings of Bounded Branching Queries

We now apply the results from Section 4.1 to derive bounds on rewriting size. It is known that there is a sequencefnof monotone Boolean functions that are computable

(8)

by polynomial-size monotone NBPs, but all monotone Boolean formulas computingfn

are of sizen^Ω(logⁿ⁾ [13]. Using this fact, together with Theorems 1,3, 5, and 6, we obtain a strong negative result for PE-rewritings.

Theorem 7. There is a sequence of linear CQsqnand TBoxesTn of depth 2, both of polysize inn, such that any PE-rewriting ofqnandTnis of sizen^Ω(logⁿ⁾.

We obtain a positive result for NDL-rewritings using Theorems 1 and 4 and the fact that NBPs are representable as polynomial-size monotone circuits [22].

Theorem 8. Fix a constant` >1. Then all tree-shaped CQs with at most`leaves and arbitrary TBoxes have polynomial-size NDL-rewritings.

Finally, we use Theorems 1,3, 5, and 6 to show that the existence of polysize FO- rewritings is equivalent to the open problem of whether NL/poly⊆NC¹.

Theorem 9. The following are equivalent:

1. There exist polysize FO-rewritings for all linear CQs and depth2TBoxes;

2. There exist polysize FO-rewritings for all tree-shaped CQs with at most` leaves and arbitrary TBoxes (for any fixed`);

3. There exists a polynomial functionpsuch that every NBP of size at mostsis computable by a formula of sizep(s). Equivalently,NL/poly⊆NC¹.

5 Bounded Treewidth Queries

In Section 4, we gave bounds on the size of rewritings for restricted classes of tree- shaped CQs. In the present section, we consider arbitrary tree-shaped queries and their natural generalization to bounded treewidth queries [9]. As the rewriting size of tree- shaped queries and arbitrary TBoxes has already been studied [14], we will focus on a class of “well-behaved” TBoxes, that includes TBoxes of bounded depth as a special case. We begin by formally introducing the classes of queries and TBoxes we consider.

Bounded treewidth queriesWe recall that atree decompositionof an undirected graph G = (V, E)is a pair (T, λ)such thatT is an (undirected) tree andλassigns a label λ(N)⊆V to every nodeN ofTsuch that the following conditions are satisfied:

1. For everyv∈V, there exists a nodeNwithv∈λ(N).

2. For every edgee∈E, there exists a nodeNsuch thate⊆λ(N).

3. For everyv∈V, the nodes{N |v∈λ(N)}induce a connected subtree ofT. Thewidth of a tree decomposition(T, λ)is equal tomaxN|λ(N)|−1, and thetreewidth of a graphGis the minimum width over all tree decompositions ofG. Thetreewidth of a CQqis defined as the treewidth of the graphGq.

Polynomial image propertyLetT be an OWL 2 QL TBox, and letqbe a CQ. Then thesetWq,T of relevant words forqandT consists of all wordswof length at most

|T |+|q|such that there exists an ABoxAthat is consistent withT and a homomorphism h:q→ C_T,Awhose image contains an element of the formaw. The length bound is motivated by the following well-known fact:

(9)

Lemma 1. IfAis consistent withT andT,A |=q(a), then there is someh:q(a)→ C_T_,_Awhose image is contained in{aw|a∈Inds(A), w∈Wq,T}.

We say that a classTof TBoxes has thepolynomial image propertyif there is a polynomialpsuch that for every TBoxT ∈ Tand every CQq,|Wq,T| ≤ p(|T |+|q|).

Observe that ifd ≥ 0is fixed, then the class of TBoxes of depth at mostdhas the polynomial image property. Another relevant class of TBoxes with this property is the class of TBoxes that do not contain role inclusions.

5.1 Bounded treewidth queries and tree hypergraph programs

As in Section 4, our first step will be to relate Boolean functions induced by the query and TBox with hypergraph programs. The main difference is that in lieu of interval hypergraph programs, we will use tree hypergraph programs.

To formally define tree hypergraph programs, we must first introduce some defini- tions related to trees. Given a treeT with verticesuandv, theintervalhu, viis the set of edges that appear on the simple path connectinguandv. Ifv1, . . . , vkare vertices of T, then thegeneralized intervalhv1, . . . , vkiis defined as the union of intervalshvi, vji over all pairs(i, j). A hypergraphH = (VH, EH)is atree hypergraphif there is a tree T = (VT, ET)such thatVH =ETand every hyperedge inEHis a generalized interval ofT. A hypergraph program is atree hypergraph program(TreeHGP) if it is based on a tree hypergraph.

From f_H⁰ ^q

T toTreeHGP.We show how to construct aTreeHGPthat computesf_H⁰ ^q

T, given a TBox T, a CQ q, and a tree decomposition (T, λ) of Gq of width t. We may suppose w.l.o.g. that T contains at most(2|q| −1)²nodes, cf. [16]. In order to more easily refer to the variables inλ(N), we construct functionsλ1, . . . , λtsuch that λi(N)∈λ(N)andλ(N) =∪iλi(N).

The basic idea underlying the construction is as follows: for each nodeN in the tree decomposition ofq, we select an abstract description of the way the variables in λ(N)are homomorphically mapped into the canonical model, and we check that the selected descriptions respect the subqueries of each node and are consistent with each other. Formally, these abstract descriptions are given by the setΓt(q,T)consisting of allt-tuplesw= (w1, . . . , wt)of words fromWq,T. Intuitively, the words inwspecify, for each variablexinλ(N), the path of roles that lead from the ABox to the image ofx in the canonical model. We say thatw∈Γt(q,T)isconsistent with a nodeNinT if:

– ifA(λi(N))∈q, then eitherw[i] =εorw[i] =w⁰Rand∃R⁻v_T A – ifr(λi(N), λj(N))∈q, then one of the following holds:

• w[i] =w[j] =ε

• w[j] =w[i]·RwithRv_T r

• w[i] =w[j]·RwithRv_T r⁻

We call a pair of tuples(w₁,w₂)compatiblewith the pair of nodes(N1, N2)if:

– λi(N1) =λj(N2)implies thatw₁[i] =w₂[j]

We assume that the elements ofΓt(q,T)are numbered from1toM, and useξito refer to thei-th element. We define a treeT⁰that replaces each edge{Ni, Nj}inT by the

(10)

following sequence of edges:

{Ni, u¹_ij},{u¹_ij, v_ij¹},{v¹_ij, u²_ij},{u²_ij, v²_ij}, . . .{u^M_ij, v_ij^M}{v_ij^M, v_ji^M} {u^Mji, v^Mji}. . .{u²ji, vji²},{vji¹, u²ji},{u¹ji, vji¹},{Nj, u¹ji}

The desiredTreeHGP(Hq,T,l_q,_T)is based uponT⁰and contains the hyperedges:

– E^ki =hu^kij₁, . . . , u^kij_ni, for everyξk ∈Γt(q,T)that is consistent withNi, where Nj1, . . . , Njnare the neighbours ofNi

– E^km_ij =hv_ij^k, v^m_jii, for every pair of tuples(ξk, ξm)that is compatible with(Ni, Nj) Vertices of the hypergraph (i.e. the edges inT⁰) are labeled byl_q,_T as follows:

– every edge of the form{Ni, u¹_ij},{v^`_ij, u^`+1_ij }, or{v_ij^M, v^M_ji}is labelled0 – every edge{u^`_ij, v_ij^`}withξ`=wis labelled by the conjunction of:

• p%, ifvars(%)⊆λ(Ni)andλg(Ni)∈vars(%)impliesw[g] =ε

• p^Rz, ifvars(%) ={z} ⊆λ(Ni),z=λg(Ni), andw[g] =Rw⁰

• p^R_z,p^R_z0, andpz=z⁰, ifvars(%) ={z, z⁰} ⊆λ(Ni),z=λg(Ni),z⁰ =λg⁰(Ni), and eitherw[g] =Rw⁰orw[g⁰] =Rw⁰

Theorem 10. For every TBoxT and CQq, theTreeHGP(Hq,T,lq,T)computesf_H⁰ ^q

T. Ifqhas treewidtht, then|Hq,T| ≤8· |q|²· |Wq,T|^2t.

FromTreeHGP tof_q,^P_T.Consider a tree hypergraphH = (V, E)that is based upon the treeT whose vertices arev1, . . . , vn. LetT^↓be the directed tree obtained fromT by fixingv1as the root and orienting edges away fromv1.

We wish to construct a tree-shaped queryqHand TBoxTHof depth 2 whose primitive evaluation functionf_q^P_H_,_T_Hcan be used to computefH. The construction general- izes the one from the preceding section for linear queries. The queryq_His obtained by doubling the edges inT^↓:

qH =∃y ^

(vi,vj)∈T^↓

(rij(yi, yij)∧r_ij⁰ (yij, yj)).

The TBoxTHis defined as the union ofTeover all hyperedgese∈E. Consider some hyperedgee=hvi₁, . . . , vi_mi ∈E, and suppose w.l.o.g. thatvi₁ is the vertex inethat is highest inT^↓. ThenTecontainsBev ∃se,∃s⁻_e v ∃s⁰_e, and the axioms:

sevri₁,k if{vi₁, vk} ∈e

s⁻e vr⁰j_`,i_` if1< `≤nand(vj_`, vi_`)∈T^↓(so,{vj_`, vi_`} ∈e) s⁰_evr⁻_j,k if{vj, vk} ∈e,(vj, vk)∈T^↓, andvj6=vi₁

s⁰_evr⁰_j,k if{vj, vk} ∈e,(vj, vk)∈T^↓, andvk6=vi_`for any1< `≤m Observe that bothqHandTHare of polynomial size in|H|.

Theorem 11. For every tree hypergraphH = (V, E)and for allα:V → {0,1}and β : E → {0,1},fH(α,β) = 1ifff_q^P_H_,_T_H(γ) = 1whereγ is defined as follows:

γ(Be) =β(e),γ(rij) =γ(r⁰_ij) =α({vi, vj}), andγ(se) =γ(s⁰_e) = 0.

(11)

5.2 Tree hypergraph programs and SAC¹

To characterize the power of tree hypergraph programs, we considersemi-unbounded fan-incircuits in which NOTgates are applied only to the inputs,ANDgates have fan-in 2, and ORgates have unbounded fan-in. The complexity class SAC¹ [27] is defined by considering circuits of this type having polynomial size and logarithmic depth; SAC¹is the non-uniform analog of the class LOGCFL of all languages logspace- reducible to context-free languages [26].

We consider semi-unbounded fan-in circuits of sizeσand depthlogσ, whereσis a parameter, and show that they are polynomially equivalent toTreeHGPby providing reductions in both directions (details can be found in [3]).

Theorem 12. There exist polynomial functionspandp⁰such that:

– Every semi-unbounded fan-in circuit of size at mostsand depth at mostlogσis computable by aTreeHGPof sizep(σ).

– EveryTreeHGPof sizeσis computable by semi-unbounded fan-in circuit of size at mostp⁰(σ)and depth at mostlogp⁰(σ).

5.3 Size of rewritings of bounded treewidth queries

Theorems 10 and 12 together show us how to construct a polysize monotone SAC¹ circuit that computesf_H⁰ ^q

T. Thus, by applying Theorem 2, we obtain:

Theorem 13. Fix a constantt >0, and letTbe a class of OWL 2 QL TBoxes with the polynomial image property. Then all CQs of treewidth at mosttand TBoxes inThave polynomial-size NDL-rewritings.

In the case of FO-rewritings, we can show that the existence of polysize rewritings corresponds to the open question of whether SAC¹⊆NC¹.

Theorem 14. The following are equivalent:

1. There exist polysize FO-rewritings for all tree-shaped CQs and depth 2 TBoxes;

2. There exist polysize FO-rewritings for all CQs of treewidth at mosttand TBoxes from a classTwith the polynomial image property (for any fixedt);

3. There exists a polynomial functionpsuch that every semi-unbounded fan-in circuit of size at mostσand depth at mostlogσis computable by a formula of sizep(σ).

Equivalently,SAC¹⊆NC¹.

6 Conclusion and Future Work

In this paper, we filled some gaps in the succinctness landscape for query rewriting in OWL 2 QL by providing new bounds on the worst-case rewriting size for various forms of tree-like queries. In doing so, we closed one of the open questions from [15].

In future work, we plan to consider additional dimensions of the succinctness landscape. For example, all existing lower bounds rely on sequences(qn,Tn)in which the number of roles inqnandTngrows withn. Moreover,Tnoften contains a large number of role inclusions. Thus, an interesting and practically relevant problem is to explore the impact of restricting the number of roles and/or the use of role inclusions.

Acknowledgments.This work was partially supported by ANR grant 12-JS02-007-01, Russian Foundation for Basic Research and the programme “Leading Scientific Schools”.

(12)

References

1. S. Arora and B. Barak. Computational Complexity: A Modern Approach. Cambridge Uni- versity Press, New York, NY, USA, 1st edition, 2009.

2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, editors.The De- scription Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2003.

3. M. Bienvenu, S. Kikot, and V. Podolskii. Succinctness of query rewriting in OWL 2 QL: The case of tree-like queries. CoRR, abs/1406.3047, 2014.

4. M. Bienvenu, M. Ortiz, M. Simkus, and G. Xiao. Tractable queries for lightweight description logics. InProc. of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013). AAAI Press, 2013.

5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Litefamily. Journal of Auto- mated Reasoning, 39(3):385–429, 2007.

6. A. Chortaras, D. Trivela, and G. Stamou. Optimized query rewriting for OWL 2 QL. In Proc. of the 23rd Int. Conf. on Automated Deduction (CADE-23), volume 6803 ofLecture Notes in Computer Science, pages 192–206. Springer, 2011.

7. T. Eiter, M. Ortiz, M. ˇSimkus, T.-K. Tran, and G. Xiao. Query rewriting for Horn-SHIQ plus rules. InProc. of the 26th AAAI Conf. on Artificial Intelligence (AAAI 2012). AAAI Press, 2012.

8. G. Gottlob, S. Kikot, R. Kontchakov, V. Podolskii, T. Schwentick, and M. Zakharyaschev.

The price of query rewriting in ontology-based data access.Artificial Intelligence, to appear, 2014.

9. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries.

In V. Vianu and C. H. Papadimitriou, editors,Proc. of the 18th ACM SIGACT-SIGMOD- SIGART Symposium on Principles of Database Systems (PODS 1999), pages 21–32. ACM Press, 1999.

10. G. Gottlob, G. Orsi, and A. Pieris. Ontological queries: Rewriting and optimization. In Proc. of the 27th Int. Conf. on Data Engineering (ICDE 2011), pages 2–13. IEEE Computer Society, 2011.

11. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog programs. InProc. of the 13th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2012), pages 254–263. AAAI Press, 2012.

12. S. Jukna.Boolean Function Complexity: Advances and Frontiers. Springer, 2012.

13. M. Karchmer and A. Wigderson. Monotone circuits for connectivity require super- logarithmic depth. InProc./ of the 20th Annual ACM Symposium on Theory of Computing (STOC 1988), pages 539–550. ACM Press, 1988.

14. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. Exponential lower bounds and separation for query rewriting. InProc. of the 39th Int. Colloquium on Automata, Lan- guages, and Programming (ICALP 2012), Part II, volume 7392 ofLecture Notes in Com- puter Science, pages 263–274. Springer, 2012.

15. S. Kikot, R. Kontchakov, V. V. Podolskii, and M. Zakharyaschev. On the succinctness of query rewriting over OWL 2 QL ontologies with shallow chases. InProc. of the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2014). ACM Press, 2014.

16. T. Kloks. Treewidth: Computations and Approximations, volume 842 ofLecture Notes in Computer Science. Springer, 1994.

17. M. K¨onig, M. Lecl`ere, M.-L. Mugnier, and M. Thomazo. A sound and complete backward chaining algorithm for existential rules. InProc. of the 6th Int. Conf. on Web Reasoning and Rule Systems (RR 2012), volume 7497 ofLecture Notes in Computer Science, pages 122–138. Springer, 2012.

(13)

18. B. Motik, B. Cuenca Grau, I. Horrocks, Z. Wu, A. Fokoue, and C. Lutz. OWL 2 Web Ontology Language profiles. W3C Recommendation, 11 December 2012. Available at http://www.w3.org/TR/owl2-profiles/.

19. M. Ortiz and M. Simkus. Reasoning and query answering in description logics. InReasoning Web, volume 7487 ofLecture Notes in Computer Science, pages 1–53. Springer, 2012.

20. H. P´erez-Urbina, B. Motik, and I. Horrocks. A comparison of query rewriting techniques for DL-Lite. InProc. of the 22nd Int. Workshop on Description Logics (DL 2009), volume 477.

CEUR-WS, 2009.

21. H. P´erez-Urbina, E. Rodr´ıguez-D´ıaz, M. Grove, G. Konstantinidis, and E. Sirin. Evaluation of query rewriting approaches for OWL 2. InProc. of SSWS+HPCSW 2012, volume 943.

CEUR-WS, 2012.

22. A. Razborov. Lower bounds for deterministic and nondeterministic branching programs. In Proc. of the 8th Int. Symposium on Fundamentals of Computation Theory (FCT’91), volume 529 ofLecture Notes in Computer Science, pages 47–60. Springer, 1991.

23. M. Rodr´ıguez-Muro, R. Kontchakov, and M. Zakharyaschev. Ontology-based data access:

Ontop of databases. InProc. of the 12th Int. Semantic Web Conf. (ISWC 2013), volume 8218 ofLecture Notes in Computer Science, pages 558–573. Springer, 2013.

24. R. Rosati. Prexto: Query rewriting under extensional constraints in DL-Lite. InProc. of the 9th Extended Semantic Web Conf. (EWSC 2012), volume 7295 ofLecture Notes in Computer Science, pages 360–374. Springer, 2012.

25. R. Rosati and A. Almatelli. Improving query answering over DL-Lite ontologies. InProc. of the 10th Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR 2010), pages 290–300. AAAI Press, 2010.

26. H. Venkateswaran. Properties that characterize LOGCFL.J. Computer and System Sciences, 43(2):380–404, 1991.

27. H. Vollmer. Introduction to circuit complexity - a uniform approach. Texts in theoretical computer science. Springer, 1999.

28. M. Yannakakis. Algorithms for acyclic database schemes. InProc. of the 7th Int. Conf. on Very Large Data Bases (VLDB’81), pages 82–94. IEEE Computer Society, 1981.