Long Rewritings, Short Rewritings

(1)

Long Rewritings, Short Rewritings

S. Kikot¹, R. Kontchakov¹, V. Podolskii², and M. Zakharyaschev¹

1 Department of Computer Science and Information Systems Birkbeck, University of London, U.K.

{kikot,roman,michael}@dcs.bbk.ac.uk

2 Steklov Mathematical Institute, Moscow, Russia podolskii@mi.ras.ru

1 Introduction

An ontology language L is said to enjoy FO-rewritability if any conjunctive query (CQ) qover any ontologyT, given in L, can be transformed into an FO- formulaq⁰ such that, for any data A, all answers toq over the knowledge base (T,A) can be found by queryingq⁰overAusing a standard relational database management system (RDBMS). Ontology languages with this property include the OWL 2 QL profile of OWL 2, which is based on description logics of the DL-Lite family [7, 16, 2], and fragments of Datalog^± such as linear or sticky TGDs [5, 6]. Various rewriting techniques have been implemented in the systems QuOnto [1], REQUIEM [15], Presto [22], Nyaya [8], IQAROS³ and Quest.⁴

The idea of using languages with FO-rewritability for ontology-based data access (OBDA) relies on the empirical fact that RDBMSs are usually very efficient in practice. However, the first rewritings of CQs overOWL 2 QLontologies [7, 15]

turned out to be too lengthy even for modern RDBMSs. The attempts to employ various optimisation techniques still produced rewritings of exponential size in the worst case:O((|T | · |q|)^|q|) [22, 8, 20, 21]. The alternative two-step combined approach [14, 13]—first expand the data by applying the ontology axioms to the data and introducing (some of) the missing individuals, and only then rewrite the query over the expanded data—resulted in a simple polynomial rewriting only for the fragment ofOWL 2 QLwithout role inclusions; for the full language, the rewriting remained exponential. Two seemingly contradictory results, presented at DL 2011, added more spice to the quest for short rewritings: [9] showed that one can construct, in polynomial time, a nonrecursive Datalog (NDL) rewriting for some fragments of Datalog^± containing OWL 2 QL, while [11] argued that no FO-rewriting forOWL 2 QLcan be constructed in polynomial time.

The aim of this paper is twofold. First, we investigate the worst-case size of FO- and NDL-rewritings for CQs over OWL 2 QLontologies. We distinguish between ‘pure’ rewritings, which can use the signature of the original query and ontology as well as =, 6= (cf. [7]), and ‘impure’ rewritings, where other means such as new constants are allowed. Here is a summary of the obtained results:

3 http://code.google.com/p/iqaros/

4 http://obda.inf.unibz.it/protege-plugin/quest/quest.html

(2)

(1) An exponential blow-up is unavoidable for pure positive existential (PE) rewritings and pure NDL-rewritings; pure FO-rewritings can blow-up superpolynomially unlessNP⊆P/poly.

(2) Pure NDL-rewritings are in general exponentially more succinct than pure PE-rewritings.

(3) Pure FO-rewritings can be superpolynomially more succinct than pure PE- rewritings.

(4) Impure PE- and NDL-rewritings can always be made polynomial, and so they are exponentially more succinct than pure PE- and NDL-rewritings, respectively.

(1)–(3) are proved by establishing connections between pure rewritings for CQs over OWL 2 QL ontologies and circuits for monotone Boolean functions. In a nutshell, we show that CQs andOWL 2 QLontologies can encode such problems as the existence of a k-clique in a graph with n vertices whose edges are given by a single-element ABox. The polynomial PE-rewriting in (4) is similar to the NDL-rewriting of [9]: using two extra constants, = and polynomially many new existentially quantified variables, one can guess a relevant part of the canonical model of T in the rewritten query. The difference between the resulting impure PE-rewritings and exponential-size pure PE-rewritings is of the same kind as the difference between deterministic and nondeterministic Boolean circuits.

Our second aim is to analyse the causes behind long rewritings and whether they occur in real-world queries and ontologies. As a result, we suggest some short rewritings that cover most practical cases.

Omitted proofs can be found in [10] and the full version of [12].

2 Queries over OWL 2 QL Ontologies

The language ofOWL 2 QLis defined by the following grammar:⁵ R ::= P_i | P_i⁻,

B ::= ⊥ | Ai | ∃R,

C ::= B | ∃R.B,

where theAiare concept names and thePiare role names. AnOWL 2 QLTBox, T, is a finite set ofinclusions of the form B vC, R1 vR2, B1uB2v ⊥ and R1uR2v ⊥. Note that concepts of the form∃R.B can only occur in the right- hand side of concept inclusions. An ABox, A, is a finite set of assertions of the form Ak(ai) and Pk(ai, aj), where ai, aj are individual names. T and A together form the knowledge base (KB) K = (T,A). The semantics is defined in the usual way, based on interpretations I = (∆Î,·Î) with domain ∆Î and interpretation function·Î. The set of individuals inAis denoted byind(A);IA

is the interpretation with domain ind(A) such that, for any concept or role E and anya ⊆ind(A), we have IA |=E(a) iff E(a)∈ A. We writeE1 vT E2 if T |=E₁vE₂; and we set [E] ={E⁰|EvT E⁰ andE⁰vT E}.

5 We do not consider data properties, attributes and role (ir)reflexivity constraints.

(3)

A conjunctive query (CQ) q(x) is a formula ∃yϕ(x,y), where ϕ is a con- junction of atoms of the form A_k(t₁) and P_k(t₁, t₂), and each t_i is a term (an individual or a variable from x, y). A tuple a⊆ind(A) is a certain answer to q(x) overK= (T,A) ifI |=q(a) for all I |=K; then we write K |=q(a).

Query answering over OWL 2 QL KBs is based on the fact that, for any consistent KB K= (T,A), there is an interpretation C_K such that, for all CQs q(x) and a ⊆ ind(A), we have K |= q(a) iff C_K |= q(a). The interpretation C_K, called the canonical model of K, can be constructed as follows. For each pair [R],[B] with ∃R.B in T (we assume ∃R is just a shorthand for ∃R.>), we introduce a fresh symbol w_[RB] and call it the witness for ∃R.B. We write K |=C(w[RB]) if∃R⁻vT CorBvT C. Define agenerating relation,;, on the set of these witnesses together withind(A) by taking:

– a;w_[RB] ifa∈ind(A), [R] and [B] arev_T-minimal such thatK |=∃R.B(a) and there is nob∈ind(A) withK |=R(a, b)∧B(b);

– w[R⁰B⁰];w[RB] ifu;w[R⁰B⁰], for someu, [R] and [B] arevT-minimal with K |=∃R.B(w[R⁰B⁰]) and it is not the case that R⁰vT R⁻ andK |=B(u).

If a;w_[R₁_B₁_] ;· · ·;w_[R_n_B_n_], n≥0, then we say thata generates the path aw_[R₁_B₁_]· · ·w_[R_n_B_n_]. Denote by path_K(a) the set of paths generated bya, and bytail(π) the last element inπ∈path_K(a).CK is defined by taking:

∆^C^K =[

a∈ind(A)path_K(a), a^C^K = a, fora∈ind(A), A^C^K = {π∈∆^C^K | K |=A(tail(π))},

P^C^K = {(a, b)∈ind(A)×ind(A)| K |=P(a, b)} ∪ {(π, π·w_[RB])|tail(π);w_[RB], Rv_T P} ∪ {(π·w_[RB], π)|tail(π);w_[RB], Rv_T P⁻}.

Theorem 1 ([7, 13]). For every OWL 2 QL KB K = (T,A), every CQ q(x) and every a⊆ind(A),K |=q(a)iffC_K|=q(a).

Given a CQq(x) and a TBoxT, a first-order formula q⁰(x), possibly with

= and6=, is called anFO-rewriting for q(x)andT if, for any ABoxAand any a ⊆ind(A), we have (T,A)|=q(a) iffI_A |=q⁰(a). If q⁰ is an FO-rewriting of the form∃yϕ(x,y), whereϕis built from atoms using only ∧and ∨, then we callq⁰(x) apositive existential rewriting forq(x)andT (or aPE-rewriting, for short). We say thatq⁰ ispure if it does not contain constants that do not occur inq (such constants are interpreted by fresh individuals added toIA). Thesize

|q⁰|ofq⁰ is the number of symbols inq⁰.

We also consider rewritings in the form of nonrecursive Datalog queries. We remind the reader that a Datalog program, Π, is a finite set of Horn clauses

∀x(A₁∧ · · · ∧A_m → A₀), where each A_i is an atom of the form P(t₁, . . . , t_l) and eacht_j is either a variable fromxor a constant.A₀is called thehead of the clause, and A1, . . . , Am its body. All variables occurring in the head must also occur in the body. A predicateP depends on a predicateQinΠ ifΠ contains a clause whose head isP and whose body containsQ.Π is callednonrecursive if

(4)

this dependence relation forΠ is acyclic. Anonrecursive Datalog query consists of a nonrecursive Datalog program Π and a goal G, which is just a predicate.

A tuplea ⊆ind(A) is a certain answer to (Π, G) over AifΠ,A |=G(a). The size |Π|of Π is the number of symbols in it. We distinguish betweenpure and impure Datalog queries [3]. In a pure query (Π, G), the clauses in Π do not contain constant symbols in their heads. One reason for considering only pure queries in OBDA is that impure ones can add new facts to the database that do not follow from the background ontology. Impure queries are known to be more succinct than pure ones.

Given a CQ q(x) and anOWL 2 QL TBox T, a pure nonrecursive Datalog query (Π, G) is called a nonrecursive Datalog rewriting for q(x) and T (or an NDL-rewriting, for short) if, for any ABox A and any a ⊆ ind(A), we have (T,A) |= q(a) iffΠ,A |=G(a). If Π does not contain constants that do not occur inq then we say that the NDL-rewriting (Π, G) ispure.

3 Query Rewritings and Boolean Circuits

To establish results (1)–(3) on the size of rewritings mentioned in the introduction, we show how the problem of constructing circuits that compute monotone Boolean functions can be reduced to the problem of finding pure FO- and NDL- rewritings for CQs overOWL 2 QL ontologies.

Our reduction proceeds in three steps. First, we take any familyf¹, f², . . . of monotone Boolean functions inNP, where fⁿ:{0,1}ⁿ → {0,1}, a polynomial p and a family C¹,C², . . . of polynomial-size circuits such that fⁿ(α) = 1 iff Cⁿ(α,β) = 1, for someβ∈ {0,1}^p(n). Using the Tseitin transformation [23], we construct a polynomial-size CNF θ_fn that computesfⁿ in the following sense:

Lemma 1. If fⁿ is monotone then ϕ^α_fn = V

α_j=0¬xj

∧θfⁿ is satisfiable iff fⁿ(α) = 1, for allα= (α1, . . . , αn)∈ {0,1}ⁿ.

Letϕ_fn be ϕ^α_fn forα = (0, . . . ,0). It should be clear thatϕ^α_fn is obtained fromϕ_fn by removing the clauses ¬x_j for which the jth component ofα is 1.

The second step is to encode theϕfⁿ by means of TBoxesTfⁿ and CQsq_fn. Letp1, . . . , pN be the propositional variables andC1, . . . , Cdthe clauses inϕfⁿ. ThenTfⁿ contains the following concept inclusions, for 1≤i≤N, 1≤j≤d:

A_i−1v ∃P⁻.X_i^`, X_i^`vAi, for`= 0,1, X_i⁰vZi,j if ¬pi∈Cj, Zi,jv ∃P.Z_i−1,j, X_i¹vZi,j if pi ∈Cj, A₀uA_iv ⊥, A₀u ∃P v ⊥, A₀uZ_i,jv ⊥, if (i, j)∈ {(0,/ 1), . . . ,(0, n)}, and the CQ is defined as follows:

q_fn=∃y∃z h

A₀(y₀) ∧

N

^

i=1

P(y_i, y_i−1) ∧

d

^

j=1

P(yN, zN−1,j) ∧

N−1

^

i=1

P(zi,j, z_i−1,j) ∧ Z0,j(z0,j)i ,

(5)

wherey= (y₀, . . . , y_N) andz= (z_0,1, . . . , z_N_−1,1, . . . , z_0,d, . . . , z_N_−1,d). The size ofT_fn andq_fn isO(|Cⁿ|²). Note thatT_fn is acyclic andq_fn is tree-shaped and has no answer variables. The canonical model C_(T_{f n}_,{A₀_(a)}) of (T_fn,{A₀(a)}) and the queryq_fn are illustrated below.

C_(T_{f n}_,{A₀_(a)}) a A0

X1¹, A1

X1⁰, A1

X2¹, A2, Z2,1

X2⁰, A2

X₂¹, A2

X₂⁰, A2

X₃¹, A3

X3⁰, A3

X3¹, A3

X₃⁰, A3

X3¹, A3

X3⁰, A3

X₃¹, A3

X₃⁰, A3, Z3,2

Z1,1

Z0,1

Z2,2

Z1,2

Z0,2

q_fn

y0

A0

y1 y2 y3

z2,1

z2,2

z1,1

z1,2

z0,1

z0,2

Z0,1

Z0,2

For each α= (α1, . . . , αn)∈ {0,1}ⁿ, we define the following ABox:

Aα =

A0(a) ∪

Z0,j(a)|1≤j≤nandαj= 1 . Lemma 2. (Tfⁿ,Aα)|=q_fn iffϕ^α_fn is satisfiable, for α∈ {0,1}ⁿ.

To complete our reduction, we show that rewritings forq_fn andT_fn can be turned into Boolean circuits computingfⁿ.

Lemma 3. (i)Supposeq⁰_fnis a pure FO-(PE-)rewriting forTfⁿandq_fn. Then there is a (monotone)Boolean formulaψ_fn computing fⁿ with|ψ_fn| ≤ |q⁰_fn|.

(ii)Suppose(Π_fn, G)is a pure NDL-rewriting for T_fn andq_fn. Then there is a monotone Boolean circuitC_fn computing fⁿ with |C_fn| ≤ |Π_fn|.

The proof proceeds by eliminating quantifiers in the rewriting and replacing its predicates with propositional variables using the fact that, in ABoxes A_α, these predicates can only be true on the individual a. Lemmas 1 and 2 ensure that the resulting Boolean formula or circuit computes fⁿ. The next lemma shows that circuits computingfⁿ can be turned into pure rewritings forq_fnand Tfⁿ.

Lemma 4. (i) Suppose q_n is an FO-sentence such that (T_fn,A_α) |= q_fn iff I_A_α|=q_n, for allα∈ {0,1}ⁿ. Then there exists a pure FO-rewritingq⁰_n forq_f_n andTf_n with|q⁰_n| ≤ |q_n|+p(n), for a polynomialp.

(ii)Suppose (Πn, G) is a pure NDL-query with a propositional goalG such that (Tfⁿ,Aα) |=q_fn iff Πn,Aα |= G, for α ∈ {0,1}ⁿ. Then there is a pure NDL-rewriting(Π_n⁰, G⁰)forq_fnandTfⁿwith|Π_n⁰| ≤ |Πn|+p(n),pa polynomial.

Now, results (1)–(3) formulated in the introduction can be obtained by applying Lemmas 1–4 to three concrete Boolean functions. For (1), we use the

(6)

function Cliquen,k of n(n−1)/2 variables e_ij, 1≤i < j ≤n, which returns 1 iff the graph with vertices {1, . . . , n} and edges {{i, j} | e_ij = 1} contains a k-clique. A series of papers, started by Razborov’s [19], gave an exponential lower bound for the size of monotone circuits computingCliquen,k: 2^Ω(

√k)for k≤¹₄(n/logn)^2/3. For monotone formulas, an even better lower bound is known:

2^Ω(k)for k= 2n/3 [18]. The question whether Cliquen,k can be computed by a polynomial-size circuit is equivalent to whetherNP⊆P/poly.

To show (2), we use the functionGenn³ ofn³ variablesx_ijk, 1≤i, j, k≤n, defined as follows. We say that 1 generates k ≤n if eitherk = 1 or, for some i and j such that xijk = 1, 1 generates both i and j. Genn³(x111, . . . , xnnn) returns 1 iff 1 generates n. It is clearly a monotone function computable by polynomial-size monotone circuits. On the other hand, any monotone formula computingGenn³ is of size at least 2ⁿ^ε, for some ε >0 [17].

For (3), we use the function Matching²ⁿ of n² variables eij, 1 ≤ i, j ≤ n, which returns 1 iff there is a perfect matching in a bipartite graph G with vertices{v₁¹, . . . , v_n¹, v₁², . . . , v²_n}and edges{{v_i¹, v_j²} |eij = 1}, i.e., a subsetE of edges inGsuch that every node inGoccurs exactly once inE. An exponential lower bound 2^Ω(n) for the size of monotone formulas computing this function was obtained in [18]. However, Matching2n is computable by non-monotone formulas of sizen^O(logⁿ⁾ [4].

The exponential bounds for the size of rewritings can be reduced topolyno- mial by using two extra constants, say 0 and 1, which are included in the domain of every I_A (see [9] and [10] for polynomial-size NDL- and PE-rewritings, respectively). As these constants may not occur in the original query and intended ABoxes, we call such rewritingsimpure(in [9] and [10], 0, 1, = and polynomially- many fresh existentially quantified variables are used to guess some part of the canonical model). The difference between pure and impure rewritings is similar to the difference between deterministic circuits and nondeterministic circuits with additional existentially quantified input variables. For example,Cliquen,k

is computed by a polynomial-size monotone nondeterministic circuit, but not by a monotone deterministic circuit of polynomial size [19].

4 Why are Pure Rewritings so Long?

Let q(x) = ∃yϕ(x,y) be a connected CQ and T a TBox. Let termq be the set of terms in q. Denote by C_T the disjoint union of the canonical models for consistent (T,{R(aRB, bRB), B(bRB)}), in which the generating relation is extended with aRB ; bRB. The RB-subtree of C_T has rootaRB and consists of the full subtree ofC_T with root bRB extended with the edge (aRB, bRB). We also need the formulas

ext_C(x) = _

AvTC

A(x) ∨ _

∃RvTC

∃y R(x, y), ext_P(x, y) = _

RvTP

R(x, y),

for conceptsC and role namesP. SupposeC_K|=^aϕ,K= (T,A), where ais an assignment of elements of ∆^C^K to the variables inϕunder whicha(x)∈ind(A)

(7)

for allx∈x. Consider an atomP(t, t⁰)∈qwith bound variablest,t⁰and assume that a(t₀)∈ind(A), for some t₀ ∈ termq. The assignment a can send t and t⁰ to four different locations in C_K: (A) a(t),a(t⁰) ∈ ind(A); (B) a(t) ∈ ind(A), a(t⁰)∈/ ind(A); (B⁻) a(t)∈/ ind(A),a(t⁰)∈ind(A); (O)a(t),a(t⁰)∈/ ind(A). Let us see how these alternatives can be reflected in a rewriting. In case (A) we have C_K|=^a P(t, t⁰) iffA |=^aextP(t, t⁰). Case (B) is possible only if, for some concept

∃R.B, we have a(t);w_[RB],Rv_T P and the atoms ofq ‘linked to’ t⁰ can be mapped into theRB-subtree ofC_T. To illustrate, consider an example.

Example 1. LetT andqbe as in the picture below. The answer variablexmust be mapped by a to an ABox element. However, y can be mapped either to an ABox element or to the point a(x)·w_[S−B] provided that a(x) is an instance of ∃S⁻.B. In the latter case, we have two ways of mapping z: either to a(x)· w_[S⁻_B]w_[SA⁰_], in which case we must seta(v) =a(x)·w_[S⁻_B]w_[SA⁰_]w_[T_>], or to a(x), provided thata(x) is an instance of∃T. Thus we have two (partial) maps f andg fromq into theS⁻B-subtree ofCT shown below.

T ={ Av ∃S⁻.B, Bv ∃S.A⁰, A⁰v ∃T } q(x) ={S(y, x), S(y, z), T(z, v)}

x y z v

S S T

a_S−B

B A⁰

S⁻ S T

x y z v

S S T

f f f f

g g g

q q

These observations motivate our key definition. Given a pair (t, t⁰) of adjacent terms in q, a tree witness⁶ for (t, t⁰) is a homomorphism f from the query q_f =

E(s) ∈ q | s ⊆ domf, s 6⊆ [t]_f to the RB-subtree of C_T, for some

∃R.B, such thatf(t) =a_RB,domf is the smallest set containingt,t⁰ for which s⁰ ∈domf wheneverS(s, s⁰)∈q with s∈domf \[t]_f, and all s∈domf\[t]_f are bound variables in q. Here ∼f denotes the equivalence relation on domf defined by taking s∼f s⁰ ifff(s) =f(s⁰) and [t]f the equivalence class oft. In Example 1,f andgare two tree witnesses for (x, y). Returning back to case (B), we can say now that there must exist a tree witnessf for (t, t⁰) such thata(t) satisfies the followingtree-witness formula twf forf withf(t) =aRB:

twf = ext_∃R.B(t) ∧ ^

s∈[t]f

(t=s) ∧ ^

E(s)∈q,s⊆[t]f

extE(s).

Case (B⁻) is symmetric, and in case (O) there must existS(s, s⁰)∈qfor which (B) holds, withP(t, t⁰) being ‘covered’ by the tree witness for (s, s⁰).

This analysis suggests the following idea for a rewriting. We guess pairs of adjacent terms (t, t⁰) inq that will be mapped to edges of the tree part ofC_K

6 A different notion of tree witness was used forDL-Lite^N_horn[13], where the structure of the canonical models ensured uniqueness of every tree witness.

(8)

and the tree witnesses that will cover them, as in cases (B), (B⁻) and (O). The part ofq that is not covered by these tree witnesses will be mapped toind(A), as in case (A). The query representing the guesses is then evaluated over I_A. The following example shows, however, that this idea needs a refinement.

Example 2. Letq(x1, x4) ={R(x1, y2), R(y3, y2), R(y3, x4)}, shown below on the left, and T ={Av ∃R, Av ∃R⁻}.

x1

y2

y3

x4

R R R

f

g A

a

aw[R>] aw_[R−>]

R⁻ R

C(T,{A(a)})

Clearly, there is a tree witness f for (x₁, y₂) with domf = {x1, y₂, y₃} and [x₁]_f ={x₁, y₃}, and a tree witnessg for (x₄, y₃) withdomg={x₄, y₃, y₂} and [x₄]_g ={x₄, y₂}. Although these tree witnesses cover the whole query q, they are only ‘realised,’ say, in the canonical modelC_(T_,{A(a)}) (shown above on the right) under conflicting maps: f sends x1, y3 to a and y2 to aw_[R>], while g sendsx4, y2 toaandy3 toaw_[R−>]; in fact, (T,{A(a)})6|=q(a, a).

Tree witnessesf and g, for (t, t⁰) and (s, s⁰), respectively, are compatible if domf∩domg⊆[t]f∩[s]g. Iff andgare incompatible and neitherdomf ⊆domg nordomg⊆domf, then we callf andgconflicting(e.g.,f andgin Example 2).

A set Ξ of tree witnesses is calledconsistent if all pairs of tree witnesses in Ξ are compatible. Let

q_e(x) = detached_q ∨ _

Ξconsistent

∃y ^

f∈Ξ

tw_f ∧ ^

E(s)∈q s6⊆domf,for allf∈Ξ

ext_E(s) ,

where detachedq = ⊥ if x 6= ∅; otherwise it is a disjunction of the sentences

∃xext_∃R.B(x) such that there is a homomorphism fromq to the RB-subtree of C_T. The next theorem shows thatq_eis a pure PE-rewriting forq andT: Theorem 2. For allAanda⊆ind(A), we have(T,A)|=q(a)iffI_A|=q_e(a).

Example 3. Let q(x) ={R_i(x, y_i)|i≤n} and T ={A_i v ∃R_i |i ≤n}. Each pair (x, y_i) gives rise to one tree witnessf_i withtw_f_i =A_i(x)∨ ∃y R_i(x, y), and q_e=W

N⊆[0,n]∃y(V

i∈Ntwf_i∧V

j /∈NRj(x, yj)).

The size of q_e is O((n_T,q + 1)^|q|· |T | · |q|²), where n_T,q is the number of distinct tree witnesses. Now, we observe that if

(conf ) there are no conflicting tree witnesses forq andT thenq_ecan be transformed to the query

q_c(x) = detachedq ∨ ∃y ^

{t,t⁰} t,t⁰adjacent

h ^

E(s)∈q,s⊆{t,t⁰}

extE(s) ∨ _

fis a tree witness t,t⁰∈domf

twf

i

(ifqhas no binary predicates thenq_c=q_e).

(9)

Theorem 3. IfT andq satisfy(conf )then, for any ABoxAanda⊆ind(A), we have (T,A)|=q(a)iff I_A|=q_c(a).

The size ofq_cisO(n_T_,q·|T |·|q|²). So, if(conf )holds andn_T_,q is polynomial then q_c is apolynomial pure PE-rewriting of q andT. For instance, the exponential q_e of Example 3 reduces to polynomial q_c =∃yV

i≤n(R_i(x, y_i)∨tw_f_i).

Note, however, that the CQs and TBoxes used in Section 3 generateexponentially many distinct tree witnesses.

We show now that, for a large class of CQsqand TBoxesT, all tree witnesses f for each (t, t⁰) in q (even if there are exponentially many of them) can be represented by a polynomial formula. Observe that each twf is determined by a concept∃R.B (such that the RB-subtree ofCT contains the range of f) and the equivalence relation∼f ondomf. As the number of concepts∃R.Bis linear in |T |, the rewriting q_c may be regarded polynomial if we show that all tree witnesses for each (t, t⁰) have the same equivalence relation. For example, let q(x) ={S(x, y), R(y, zi)|1≤i≤n} andT ={Av ∃S, ∃S⁻v ∃R.B1, ∃S⁻v

∃R.B2}. There are 2ⁿtree witnesses for (x, y), as eachz_ican be mapped either to aB₁- or a B₂-point inC_T, and yet they all define the same equivalence relation and the same tree-witness formula ext_∃S(x).

We formalise this intuition in the following definition. For a pair (t, t⁰) of adjacent terms inq, we callf = (domf,∼f) auniversal tree witness for(t, t⁰) if domf is a subset oftermq witht, t⁰ ∈domf and∼f is an equivalence relation ondomf such that q_f =

E(s)∈q|s⊆domf,s6⊆[t]f /_∼_f is a tree-shaped query with root [t]f and, for every tree witnessg for (t, t⁰),

– domg=domf and there exists a homomorphismh:q_f→ CT that preserves the distance from the root andg(s) =h([s]_f), for everys∈domf.

A universal tree witnessf for (t, t⁰) is not a tree witness in the sense of our original definition, but rather a convenient structure representing all tree witnesses for (t, t⁰): we can merge the tree-witness formulas for (t, t⁰) into one formula

twf = h _

∃R.B∈Φf

ext_∃R.B(t)i

∧ ^

s∈[t]f

(t=s) ∧ ^

E(s)∈q,s⊆[t]f

extE(s),

whereΦf is the set of concepts∃R.B such there is a homomorphismhfrom q_f to theRB-subtree ofC_T withh([t]f) =aRB.

We now identify a class of CQs and TBoxes for which a universal tree witness is unique (if exists) and can be constructed in time polynomial in |q| and|T |.

Intuitively, for each tree witness f, we disallow situations in whichCT and the quotientq_f/_∼_f ofq_f simultaneously contain fragments of the form

T S

q_f/_∼_f CT

(10)

We say that a role S isforward ifu;v for all (u, v)∈ S^C^T. If neither S nor its inverse S⁻ is forward thenS is said to be atwisty role. A tree witnessf for (t, t⁰) is calledperfect if, for allT(s₁, s₂), S(s₂, s₃)∈q_f/_∼_f such thats₂ 6= [t]_f andS is twisty, we haveC_T 6|=inv(T, S)∧suc(T, S), where

suc(T, S) =∃x, y, z(T(x, y)∧S(y, z)∧(x6=z)), inv(T, S) =∃x, y(T(x, y)∧S(y, x)).

Lemma 5. There is a polynomial-time algorithm which, given q,T and a pair (t, t⁰), checks whether all tree witnesses for (t, t⁰) are perfect, and if this is the case, returns a unique universal tree witness f_t,t0 for(t, t⁰).

Note that even though there may be exponentially many tree witnesses for (t, t⁰), the algorithm checks whether they all are perfect in polynomial time. We are now in a position to define ourpolynomial PE-rewritingq_pforq andT:

q_p(x) = detached_q ∨ ∃y^

{t,t⁰} t,t⁰ adjacent

h ^

E(s)∈q,s⊆{t,t⁰}

ext_E(s) ∨ _

s,s⁰adjacent t,t⁰∈domf_s,s0

tw_f_s,s₀i .

Theorem 4. If all tree witnesses forq andT are perfect and condition(conf ) is satisfied then, for any ABox A and anya ⊆ind(A), we have(T,A)|=q(a) iffI_A|=q_p(a). Moreover,q_p is constructed in time polynomial in |q| and|T |.

IfT does not contain any twisty roles, then all tree witnesses in any CQq overT are perfect. On the other hand, all examples of conflicting tree witnesses above involve twisty roles. The following theorem shows that this is no accident:

Theorem 5. Let T be an OWL 2 QL ontology without twisty roles. Then, for any CQ q, there are no conflicting tree witnesses for q and T. Thus, q_p is a pure PE-rewriting forq andT and it can be constructed in polynomial time.

Note that OWL 2 ELontologies satisfy this condition, and so a polynomial rewriting similar to q_p can also be used for CQ answering over such ontologies provided that the ABoxes are complete with respect to the ontologies [12].

5 Conclusions

As we saw in Section 3, pure PE- and NDL-rewritings of CQs over OWL 2 QL ontologies are of exponential size in the worst case. The analysis in Section 4 showed that the length of a rewriting is related to the number of tree witnesses in the query, which reflect how various parts of the query can be homomorphically mapped to the intensional tree part of the canonical model. Thus, a rewriting can be lengthy if the original query is sufficiently long and the intensional part of the canonical model for the ontology is sufficiently complex. We proved that by restricting the interaction between inverse roles and role inclusions in ontologies and queries, we can guarantee transparent polynomial rewritings. Moreover, as shown by a series of experiments [12], real-world ontologies and CQs contain very few tree witnesses, which are never in conflict, satisfy the above mentioned restrictions, and so enjoy pure, polynomial PE-rewritings.

(11)

References

1. Acciarri, A., Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.:QuOnto: Querying ontologies. In: Proc. of the 20th Nat. Conf. on AI, AAAI. pp. 1670–1671 (2005)

2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. Journal of Artificial Intelligence Research (JAIR) 36, 1–69 (2009) 3. Benedikt, M., Gottlob, G.: The impact of virtual views on containment. PVLDB

3(1), 297–308 (2010)

4. Borodin, A., von zur Gathen, J., Hopcroft, J.E.: Fast parallel matrix and gcd computations. In: Proc. of FOCS. pp. 65–71 (1982)

5. Cal`ı, A., Gottlob, G., Lukasiewicz, T.: A general Datalog-based framework for tractable query answering over ontologies. In: Proc. of PODS. pp. 77–86 (2009) 6. Cal`ı, A., Gottlob, G., Pieris, A.: Advanced processing for ontological queries.

PVLDB 3(1), 554–565 (2010)

7. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family.

J. of Automated Reasoning 39(3), 385–429 (2007)

8. Gottlob, G., Orsi, G., Pieris, A.: Ontological queries: Rewriting and optimization.

In: Proc. of the IEEE Int. Conf. on Data Engineering, ICDE (2011)

9. Gottlob, G., Schwentick, T.: Rewriting ontological queries into small nonrecursive Datalog programs. In: Proc. of DL. vol. 745. CEUR-WS.org (2011)

10. Kikot, S., Kontchakov, R., Podolskii, V., Zakharyaschev, M.: Exponential lower bounds and separation for query rewriting. CoRR, arXiv:1202.4193, 2012.

11. Kikot, S., Kontchakov, R., Zakharyaschev, M.: On (in)tractability of OBDA with OWL 2 QL. In: Proc. of DL. vol. 745. CEUR-WS.org (2011)

12. Kikot, S., Kontchakov, R., Zakharyaschev, M.: Conjunctive query answering with OWL 2 QL. In: Proc. of KR. AAAI Press (2012) (seewww.dcs.bbk.ac.uk/~kikot) 13. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The combined approach to query answering in DL-Lite. In: Proc. of KR. AAAI Press (2010) 14. Lutz, C., Toman, D., Wolter, F.: Conjunctive query answering in the description logic EL using a relational database system. In: Proceedings of the 21st Int. Joint Conf. on Artificial Intelligence, IJCAI 2009. pp. 2070–2075 (2009)

15. P´erez-Urbina, H., Motik, B., Horrocks, I.: A comparison of query rewriting techniques for DL-Lite. In: Proc. of DL. vol. 477. CEUR-WS.org (2009)

16. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:

Linking data to ontologies. J. on Data Semantics X, 133–173 (2008)

17. Raz, R., McKenzie, P.: Separation of the monotone NC hierarchy. In: Proc. of FOCS. pp. 234–243 (1997)

18. Raz, R., Wigderson, A.: Monotone circuits for matching require linear depth.

J. ACM 39(3), 736–744 (1992)

19. Razborov, A.: Lower bounds for the monotone complexity of some Boolean functions. Dokl. Akad. Nauk SSSR 281(4), 798–801 (1985)

20. Rodr´ıguez-Muro, M., Calvanese, D.: Dependencies to optimize ontology based data access. In: Proc. of DL. vol. 745. CEUR-WS.org (2011)

21. Rodr´ıguez-Muro, M., Calvanese, D.: Semantic index: Scalable query answering without forward chaining or exponential rewritings. In: Proc. of ISWC (2011) 22. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In:

Proc. of the 12th Int. Conf. KR. AAAI Press (2010)

23. Tseitin., G.: On the complexity of derivation in propositional calculus. In: Automa- tion of Reasoning 2: Classical Papers on Computational Logic 1967–1970 (1983)