• Aucun résultat trouvé

Optimal Nonrecursive Datalog Rewritings of Linear TGDs and Bounded (Hyper)Tree-Width Queries

N/A
N/A
Protected

Academic year: 2022

Partager "Optimal Nonrecursive Datalog Rewritings of Linear TGDs and Bounded (Hyper)Tree-Width Queries"

Copied!
12
0
0

Texte intégral

(1)

Optimal Nonrecursive Datalog Rewritings of Linear TGDs and Bounded (Hyper)Tree-Width Queries

M. Bienvenu1, S. Kikot2, R. Kontchakov2, V. Ryzhikov3, and M. Zakharyaschev2

1CNRS & University of Montpellier, France (meghyn@lirmm.fr)

2Birkbeck, University of London, UK ({kikot,roman,michael}@dcs.bbk.ac.uk)

3Free University of Bozen-Bolzano, Italy (ryzhikov@inf.unibz.it)

Abstract. Our concern is answering ontology-mediated queries(O,q), whereO is a set of linear tgds andqa conjunctive query (CQ) of bounded hypertree width.

Assuming that the arity of predicates is bounded, we show that polynomial-size nonrecursive Datalog rewritings can be constructed and executed in (i) LOGCFL for OMQs with ontologies of bounded existential depth; (ii) NL for OMQs with ontologies of bounded depth and CQs whose hypertree decompositions have a bounded number of leaves; (iii) LOGCFL for OMQs with acyclic CQs whose join trees have a bounded number of leaves.

1 Introduction

As shown in [3, 4, 13], the optimal combined complexity (LOGCFL and NL) of an- swering ontology-mediated queries (OMQs) with OWL 2 QL ontologies of bounded depth and conjunctive queries (CQs) of bounded treewidth can be achieved by means of rewriting them into nonrecursive datalog (NDL) queries, though not via positive- existential rewritings. (Note that in these cases the complexity of OMQs matches the complexity of evaluating the underlying CQs.) Our recent experiments have demon- strated that such NDL rewritings, reformulated as Spark SQL queries with views, are efficiently executed by Apache Spark taking advantage of their parallelisable structure.

The aim of this paper is to extend the above mentioned results to ontologies and CQs with predicates of arbitrary fixed arity. We consider ontologies that consist of linear TGDs (linear existential rules or atomic-hypothesis rules) [2, 6, 11, 12], which are instances of finite unification sets [18, 15, 16]. Our interest in this problem is also motivated by the system ETAP [5] designed to answer natural language questions by translating them into SPARQL and executing—along with background knowledge—

over RDF data extracted from texts. To illustrate, suppose that the data contains the atoms Purchased(j, c)and Car(c) representing the sentence ‘John purchased a car’.

To answer the question ‘has a car been sold?’ ETAP utilises the ontology rules (with omitted universal quantifiers)

Purchased(x, y)→ ∃vz Purchase(v)∧hasAgent1(v, x)∧

hasObject(v, y)∧hasAgent2(v, z) , Purchase(v)∧hasAgent1(v, x)∧hasObject(v, y)∧hasAgent2(v, z)→

∃v0 Sale(v0)∧hasAgent1(v0, z)∧hasObject(v0, x)∧hasAgent2(v0, y) ,

(2)

wherevandv0represent the acts of purchase and sale, respectively. The rules are clearly beyond the limitations ofOWL 2 QL; however, the knowledge they represent can also be captured by means of linear TGDs with ternary predicates:

Purchased(x, y)→ ∃zPurchase(x, y, z), Purchase(x, y, z)→Sale(z, y, x), which are enough to answer the query∃xyz(Car(y)∧Sale(x, y, z)).

We classify OMQsQ= (O,q)with linear TGDs and predicates of anyfixed arity n< ωalong three axes: (1) the existential depthdofO, that is, the maximal depth of Skolem terms in the chases ofOover arbitrary data (cf. [8]), (2) the hypertree widtht of q, and (3) the number ` of leaves in the tree underlying a hypertree decomposi- tion ofq. Thus, OMQ(p1, p2, p3)denotes the class of OMQs in which parameter (i) is bounded by pi ∈ N∪ {∞}. We show that, for any fixedd,t,` < ω, answering OMQs in the classesOMQ(d,t,∞)andOMQ(∞,1,`)can be done in LOGCFL (for combined complexity) by means of NDL-rewritings, and even in NL forOMQ(d,t,`).

On the other hand, one can show that answering OMQs inOMQ(∞,t,`), fort,`≥2, is NP-hard by observing that the sequence of tree-shaped CQs from the proof of [3, Theorem 20] is of path width 2. Thus, we obtain a full classification of the classes OMQ(p1, p2, p3), forpi∈N∪ {∞}, with respect to combined complexity.

2 Preliminaries

Ontology-mediated queries. LetΣbe arelational schema with the maximum arity ar(Σ)of its predicates bounded byn. By writingP(x), for a predicate nameP and an n-tuplexof variables (with possible repetitions), we mean thatPisn-ary. By writing γ(x), we mean that the free variables of formulaγarex, wherexcontains no repeti- tions. If the meaning is clear from the context, we use set-theoretic notation for lists.

Adata instance,D, overΣis any finite set of ground atomsP(a)with predicate symbols P from Σ. We denote by ind(D) the set of individual constants inD. An ontologyis any finite set,O, of sentences of the form

∀x(γ0(x)→ ∃yγ1(x0,y)) and ∀x(γ0(x)→γ2(x0)),

whereγ01andγ2are atoms with predicate symbols fromΣandx0⊆x, for disjoint setsxandyof variables. When writing rules, we omit the universal quantifiers.

Anontology-mediated query(OMQ)Q(x)is a pair(O,q(x)), in whichOis an ontology andq(x)aconjunctive query(CQ), that is, a formula of the form∃yϕ(x,y), whereϕis a conjunction of atomsP(z)overΣwithz⊆x∪y. A tuplea∈ind(D)|x|

is acertain answertoQ(x)overDifM|=q(a), for every modelMofO ∪ D; in this case we writeO,D |=q(a). If the listxofanswer variablesis empty, acertain answer toQoverDis ‘yes’ ifM|=q, for every modelMofO ∪D, and ‘no’ otherwise. OMQs and CQs without answer variables are calledBoolean. We often regard CQs assetsof their atoms. We abuse notation and use sets of variables in place of sequences assuming that they are ordered in some (fixed) way. Also, givenc ∈ ind(D)|z| andz ∈ z, we writec(z)to refer to the component ofcthat corresponds toz.

Canonical models.An important property of tgds is the fact [1] that, for anyOandD, there is a (possibly infinite)canonical(oruniversal)modelCO,D such that, for every

(3)

CQq(x)anda∈ind(D)|x|, we haveO,D |=q(a)iffCO,D |=q(a). Such a canonical model can be constructed by the following (oblivious)chase procedurethat, intuitively,

‘repairs’D with respect to O (though not in the most economical way). With each rule%of the formγ0(x)→ ∃yγ1(x0,y), wherex= (x1, . . . , xn),y = (y1, . . . , yk) andk >0, we associate thek-tuples%= (s1%, . . . , sk%)of distinctn-arySkolem function symbols. Anapplication of%toDunder a maph: x→ind(D)such thath(γ0)∈ D addsh01)toD, whereh0 is defined by takingh0(xi) = h(xi), for1 ≤ i ≤n, and h0(yj) =sj%(h(x)), for1≤j ≤k. An application of a ruleγ0(x)→γ2(x0)toDunder such anhaddsh(γ2)toD. Thechase algorithmapplies these two rules exhaustively toOandDin a breadth-first manner. More precisely, we setC0O,D = Dand say that the atoms inC0O,D are of (derivation)level0. Assuming that Cn−1O,D has already been constructed, we define CnO,D as follows. Take some enumeration of all distinct pairs (%i, hi)such that%i∈ Owithγion the left-hand side is applicable toCn−1O,Dunderhi. If none of the atoms in thehii)is of leveln−1, then we setCnO,D =Cn−1O,D. Otherwise, we apply the%i underhi toCn−1O,D one after the other and say that the newly added atoms are of (derivation)leveln; the resulting extension ofCn−1O,D is denoted byCnO,D. Thecanonical modelCO,Dis then the union of allCnO,D, forn < ω.

The domain∆CO,D ofCO,D consists of terms built from the constants inDusing Skolem functionssj%, for%∈ O. Thedepthof such a term is the maximal number of nested occurrences of function symbols in it. We say thatOis ofdepthk≤ω ifkis the minimal ordinal such that∆CO,D contains no terms of depth> k, for any dataD.

For an ontologyOand a ground atomP(a), we settermO(P(a)) =∆CO,{P(a)}\a.

We denote bytermO the union oftermO(P(a)), for all possible (up to renaming the constants) atomsP(a)with predicates inO(assuming that distinctP(a)do not share constants). It should be clear thatOis of finite depth ifftermOis finite. By counting the number of possible linear derivations of Skolem terms, we see that, forOof depthk,

|termO| ≤(ar(Σ)ar(Σ)|O|)k. We assume that any constantaoccurring intermOhas atwin variableeaand denote byeathe result of replacing all constants inawith their twin variables. Given a tupleb ⊆ind(D), we denote bya/b(a)e the substitution that maps eachainato the correspondingb(ea)inb.

NDL-rewritings.Adatalog program,Π, is a finite set of Horn clauses of the form

∀z(γ0 ← γ1∧ · · · ∧γm), where eachγiis an atomQ(y)withy⊆zor an equality (z = z0)withz, z0 ∈ z. (As usual, we omit ∀z from clauses.) The atomγ0 is the head of the clause, andγ1, . . . , γm itsbody. All variables in the head must occur in the body, and=can only occur in the body. The predicates in the heads of clauses in Π areIDB predicates, the rest (including=)EDB predicates. A predicateQdepends onPinΠ ifΠ has a clause withQin the head andPin the body.Π is anonrecursive datalog(NDL)programif the (directed)dependence graphof the dependence relation is acyclic. The size|Π|ofΠ is the number of symbols in it. AnNDL queryis a pair (Π, G(x)), whereΠ is an NDL program andGa predicate. A tuplea ∈ind(D)|x|is ananswer to(Π, G(x))overa data instanceDifG(a)holds in the first-order structure with domainind(D) obtained by closingDunder the clauses in Π; in this case we writeΠ,D |= G(a). The problem of checking whetherais an answer to(Π, G(x)) overDis called thequery evaluation problem. Thedepthof(Π, G(x))is the length, d(Π, G), of the longest directed path in the dependence graph forΠ starting fromG.

(4)

An NDL query(Π, G(x))is anNDL-rewriting of an OMQQ(x) = (O,q(x))in case O,D |= q(a)iffΠ,D |= G(a), for anyDand anya ∈ ind(D)|x|. Every OMQ is known to have an NDL-rewriting [2, 6].

Tree decomposition.Atree decompositionof a CQqwith variablesvar(q)is a pair (T, λ)of an (undirected) treeT = (V, E)andλ:V →2var(q)such that

– for any atomP(z)∈q, there existsv∈V withz⊆λ(v);

– for any variablezinq, the set of vertices{v∈V |z∈λ(v)}is connected inT. We callλ(v)thebag forv. Thewidthof(T, λ)ismaxv∈V |λ(v)| −1. Thetreewidth of q is the minimum width over all tree decompositions ofq. It is known [9] that, for CQs of bounded arityn, the notions of bounded treewidth andbounded hypertree width[10] are interchangeable. Indeed, in this case, every hypertree decomposition of widthtinduces a tree decomposition of widtht·n. A CQqis calledacyclicif it has ajoin tree whose nodes are the atoms ofq and, whenever atomsγ1 andγ2 share a variable, this variable occurs in all atoms along the (unique) path in the tree linkingγ1

andγ2. It is known [10] that a CQqis acyclic iffqis of hypertree width 1.

3 NL and LogCFL Fragments of NDL

In this section we present two classes of NDL queries that enjoy NL- and LOGCFL- complete evaluation. First, observe that if the number of variables in each clause of an NDL query is bounded, then the size of its grounding (obtained by replacing variables by all possible combinations of constants) is polynomial. So, evaluation of such NDL queries is tractable. However, if we bound the number of variables in clauses of NDL rewritings of OMQs, then we will also effectively impose a bound on the number of answer variables in their CQs. To avoid this limitation, we treat answer variables of the CQs (and the predicate positions they occur in) differently from all other variables in the NDL-rewritings. Intuitively, answer variables get their values fixed by a candidate certain answer and thus do not cause an exponential blowup of groundings.

Formally, an NDL query(Π, G(x))is calledorderedif each of its IDB predicatesQ has a fixed list of variablesxQ⊆x, theparameters of Q, such that

– the parameters ofGarexand, in every clause, the parameters of the head include all the parameters of the predicates in the body;

– the parametersxQ of eachQoccupy the last|xQ|positions in every occurrence ofQinΠ; they can, however, occur in other positions too.

Thewidthw(Π, G)of an ordered(Π, G(x))is the maximum number of non-parameter variables in a clause of Π. Observe that Boolean NDL queries are trivially ordered (their IDB predicates have no parameters), and the width of such queries is simply the maximum number of variables in a clause of Π. As all the NDL-rewritings we construct are ordered, with their parameters being the answer variables, in the sequel we will consider only ordered NDL queries. We say that a class of NDL queries is of bounded widthif there isw>0such thatw(Π, G)≤w, for all(Π, G(x))in the class.

As we observed above, evaluation of NDL queries of bounded width is P-complete.

Our first subclass of NDL queries is based on linear rules. An NDL program is linear[1] if the body of its every clause contains at most one IDB predicate.

(5)

Theorem 1. Evaluation of linear NDL queries of bounded width is NL-complete for combined complexity.

Our second subclass was inspired by semi-unbounded fan-in circuits. Recall that the class LOGCFL of problems reducible in logarithmic space to context-free languages can equivalently be defined in terms of L-uniform families of semi-unbounded fan- in circuits (whereOR-gates have arbitrarily many inputs, and AND-gates two inputs) of polynomial size and logarithmic depth. Alternatively, LOGCFL can be defined us- ingnondeterministic auxiliary pushdown automata(NAuxPDAs) [7], which are non- deterministic Turing machines with an additional work tape constrained to operate as a pushdown store. Sudborough [17] proved that LOGCFL coincides with the class of problems that are solved by NAuxPDAs in logarithmic space and polynomial time (the space on the pushdown tape is not subject to the logarithmic bound). Moreover, there is an algorithm that, given a semi-unbounded fan-in circuitCand an input, computes the output using an NAuxPDA in logarithmic space in the size ofC and exponential time in the depth ofC [19, pp. 392–397]. Using these results, it can be shown that any (Π, G(x))withat most two atomsin the body of any clause can be evaluated on a data instanceDby an NAuxPDA in spacelog|Π|+w(Π, G)·log|D|and time2O(d(Π,G)) (thus, in LOGCFL provided the query width is bounded and its depth is logarithmic).

In the rewritings we propose in Sections 5 and 7, however, the number of atoms in the clauses is not bounded. We require the following to generalise the idea. A function νfrom the predicate names inΠ to non-negative integersNis called aweight function for an NDL query(Π, G(x))if, for any clauseQ(z)←P1(z1)∧ · · · ∧Pk(zk)inΠ, we have

ν(Q)>0 and ν(Q) ≥ ν(P1) +· · ·+ν(Pk),

Note thatν(P)can be0for an EDB predicateP. To illustrate, we consider NDL queries with the following dependency graphs:

The one on the left has a weight function bounded by the number of predicates (i.e., linear in the size of the query); intuitively, this function corresponds to the number of directed paths from a vertex to the leaves. In contrast, any NDL query with the dependency graph on the right can only have a weight function whose values (numbers of paths) are exponential. Linear NDL queries have weight functions bounded by1.

LeteΠ be the maximum number of EDB predicates in a clause ofΠ. Theskinny depthsd(Π, G)of(Π, G(x))is the minimum value of

2d(Π, G) + logν(G) + logeΠ

over possible weight functionsν. One can show, using Huffman coding, that any NDL query(Π, G(x))can be transformed into an equivalent skinny NDL query(Π0, G(x)) of depth not exceeding sd(Π, G)and such that |Π0| = O(|Π|2) and w(Π0, G) ≤ w(Π, G). We say that a class of NDL queries haslogarithmic skinny depthif there is c >0such thatsd(Π, G)≤clog|Π|, for all(Π, G(x))in the class. We now obtain:

Theorem 2. Evaluation of NDL queries of logarithmic skinny depth and bounded width isLOGCFL-complete for combined complexity.

(6)

3.1 NDL Rewritings over (Complete) Data

We say that a data instanceDiscomplete for an ontologyOifO,D |=P(a)implies P(a)∈ D, for any ground atomP(a), wherePinΣanda⊆ind(D). An NDL query (Π, G(x))is anNDL-rewriting of an OMQQ(x) = (O,q(x))over complete datain caseO,D |=q(a)iffΠ,D |=G(a), for anyDcomplete forOand anya⊆ind(D).

Given an NDL-rewriting(Π, G(x))ofQ(x)over complete data, we denote byΠ the result of replacing each EDB predicateP inΠ with a fresh IDB predicatePof the same arity and adding the clausesP(z) ← γ for every atomγwith a predicate symbol from O such that O |= γ → P(z), where z is a tuple of variables (with possible repetitions). Clearly,(Π, G(x))is an NDL-rewriting ofQ(x)over arbitrary data instances and|Π| ≤ |Π|+ar(Σ)ar(Σ)· |O|2.

We say that a class of OMQs isskinny-reducibleif there arec >0andw>0and an LLOGCFL-transducer that, given any OMQ Q(x)in the class, computes its NDL- rewriting(Π, G(x))over complete data withsd(Π, G)≤clog|Π|andw(Π, G)≤w.

Theorem 2 and the transformationgive the following:

Corollary 1. Answering OMQs is inLOGCFLfor combined complexity for any skinny- reducible class.

The transformation, however, does not preserve linearity because it replaces oc- currences of EDB predicatesP by IDB predicatesP. A more involved ‘linear’ con- struction is given in the proof of the following, where a possible increase of the width is due to the ‘replacement’ of atomsP(z)by atomsγwheneverO |=γ→P(z):

Lemma 1. Fix anyw>0. There is anLNL-transducer that, for a linear NDL-rewriting (Π, G(x))of an OMQQ(x)over complete data withw(Π, G)≤w, computes its lin- ear NDL-rewriting(Π0, G(x))over arbitrary data withw(Π0, G)≤w+ar(Σ).

4 Conditional Rewritings

LetQ(x) = (O,q(x))be an OMQ with an ontology of finite depth. Intuitively, we recursively splitq(x)into subqueriesqDbased on subtreesDof a tree decomposition ofqand combine the rewritings ofqDinto a rewriting ofq. To guarantee ‘compatibil- ity’ of the rewritings of the subqueries, we take account of thetypesof points on the boundaries of theqD. So, for eachDand each typew, we take a fresh IDB predicate GwDto represent theconditional rewritingofqDprovided that its boundary satisfies the type. We now give formal definitions.

Atype is a partial maps from the variables ofq to termO∪{ε}; its domain is denoted bydom(s). The unique partial type withdom(ε) =∅is denoted byε. We use types to represent how variables are mapped into the canonical model:s(z) =εmeans thatzis mapped to an individual constant ands(z) =f(a), for a Skolem termf(a), means thatzis mapped to an element of the formf(c), for somec⊆ind(D). Given a typesand a tuplez= (z1, . . . , zn)⊆dom(s), we denote the tuple(s(z1), . . . ,s(zn)) bys(z). A typesiscompatiblewith a bagλ(t)ifs(x) =ε, for allx∈x∩dom(s), and, for everyS(z)∈qwithz⊆λ(t)∩dom(s), one of the following applies:

(d) s(z)⊆ {ε};

(7)

(b) there isP(a)such thats(z) ⊆termO(P(a))∪ {ε}but neithers(z) ⊆ {ε}nor s(z)⊆termO(P(a));

(i) there isP(a)such thats(z)⊆termO(P(a))andS(s(z))∈CO,{P(a)}.

Given a types, we take a tuple of variablesvar(s)that contains, forz∈dom(s)\x, variablez, ifs(z) =ε, and variablesea, ifs(z)∈termO(P(a)).

Denote the answer variables that occur indom(s)byxs. Our rewritings use conjunc- tionsAts(var(s),xs)of the following formulas, for allS(z)∈qwithz⊆dom(s):

(d0) S(z)ifs(z)⊆ {ε};

(b0) the disjunction _

g:z0a

h

P(a)e ∧ ^

z∈z0andg(z)=a

(z=ea)i

overgrounding functionsg: z0 → a such thatz0 = {z ∈ z | s(z) = ε} 6= ∅, z00 ={z ∈ z | s(z)∈termO(P(a))} 6= ∅andCO,{P(a)} contains the result of replacingz0andz00inS(z)byg(z0)ands(z00), respectively;

(i0) P(a)e ifs(z)⊆termO(P(a)).

Strictly speaking, the resulting rewritings will not be NDL programs because of dis- junctions in(b0), but we can get rid of them using an extra predicate and (if needed) the construction from the proof of Lemma 1 keeping the size and the execution time polynomial.

5 L

OG

CFL Rewritings for OMQ(d, t, ∞)

We now construct skinny-reducible NDL rewritings for the CQs of bounded treewidth.

Theorem 3. For anyd≥0andt≥1, the classOMQ(d,t,∞)is skinny-reducible.

Fix a connected CQq(x)and a tree decomposition(T, λ)of its Gaifman graph.

LetDbe a subtree ofT. ThesizeofDis the number of nodes in it. A nodetofDis calledboundaryifT has an edge{t, t0}witht0 ∈/ D. We denote by∂Dthe union of allλ(t)∩λ(t0)for boundary nodestof Dand its neighbourst0 inT outsideD. The degreedeg(D)ofDis the number of its boundary nodes (so, the only subtree ofTof degree0isT itself). We say that a nodetsplitsDinto subtreesD1, . . . , Dk if theDi partitionDwithoutt: each node ofDexcepttbelongs to exactly oneDi.

Lemma 2. LetDbe a subtree ofT of sizen >1.

If deg(D) = 2, then there is a nodet splitting D into subtrees of size ≤ n/2 and degree≤2and, possibly, one subtree of size< n−1and degree1.

Ifdeg(D)≤1, then there istsplittingDinto subtrees of size≤n/2and degree≤2.

We define recursively a setRof subtrees ofT, a binary ‘predecessor’ relation≺ on R, and a function β on R indicating the bag of the splitting node. We begin by addingT toR. Take anyD ∈ Rthat has not been split yet. If D is of size 1, then β(D) =λ(t)for the only nodetofD. Otherwise, by Lemma 2, we find a nodetinD

(8)

that splits it intoD1, . . . , Dk. We setβ(D) =λ(t)and, for1≤i≤k, addDitoRand setDi≺D; then, we apply the procedure to each ofD1, . . . , Dk. For eachD∈R, we recursively define a set of atoms

qD =

S(z)∈q|z⊆β(D) ∪ [

D0≺DqD0.

LetxDbe the set of variables fromxthat occur inqD. By the definition of tree decom- position,qT =qandxT =x.

We now define an NDL-rewriting ofQ(x) = (O,q(x)). FixD ∈ Rand a type wwithdom(w) =∂D. LetGwD(var(w),xD)be a fresh IDB predicate with parame- tersxD. As we described above, a node is selected inDto split it into smaller trees (pro- vided that it contains more than one node). We extend the typewto cover the variables β(D)of the selected bag: more precisely, we consider typesswithdom(s) = β(D) such that they are compatible with bagβ(D)and agree withw on their common do- main. Observe that, ifD0 is a subtree resulting from splittingD, then the domain of the extended type,s∪w, includes∂D0, and thus∂D0 coincides with the domain of the restriction ofs∪w to ∂D0, denoted (s∪w)∂D0. Now, for each type swith dom(s) =β(D)such thatsis compatible with bagβ(D)and agrees withwon their common domain, the NDL programΠQLOGcontains

GwD(var(w),xD)←Ats(var(s),xs)∧^

D0≺DG(s∪w)D0 ∂D0(var((s∪w)∂D0),xD0).

By induction on≺, one can now show that(ΠQLOG, GεT)is a rewriting ofQ(x).

Example 1. Letq(x0, x3) =∃x1x2 S(x0, x1)∧R(x1, x2)∧R(x2, x3)

andOconsist of the following linear rules:

%: U(x, y)→ ∃v T(x, v, y), T(x, v, y)→R(v, x), T(x, v, y)→R(y, v), T(x, v, y)→S(x, y).

The subtree structure of the tree decomposition ofq(x0, x3)and the canonical model are as follows:

D

D1 D2

x0

x1

x1

x2

x2

x3

S R R

a1 a2

f%(a1, a2)

U

R S R

The goal predicate for the rewriting of q(x0, x3)is GεD(x0, x3)with parameters x0 andx3. For the typesfor the middle bag sendingx1toεandx2tof%(a1, a2), we have GεD(x0, x3)←GxD17→ε

1 (x1, x0)∧U(ea1,ea2)∧(x1=ea2)∧GxD27→f%(a1,a2)

2 (ea1,ea2, x3), where andea1andea2are the twin variables invar(s). Note that the type forGxD27→f%(a1,a2)

2

has no non-twin variables, and we have the following rule for this predicate GxD27→f%(a1,a2)

2 (ea1,ea2, x3)←U(ea1,ea2)∧(x3=ea1).

(9)

Lemma 3. For anyDcomplete forO, any predicateGwDand anyb∈ind(D)|var(w)|+|xD|, we haveΠQLOG,D |=GwD(b)iff there is a homomorphismh:qD→CO,Dsuch that

h(z) =

(b(z), for allz∈xDand allz∈∂Dwithw(z) =ε, w(z)[a/b(ea)], for allz∈∂Dwithw(z)∈termO(P(a)).

6 NL Rewritings for OMQ(d, t, `)

For OMQs based upon bounded leaf queries and bounded depth ontologies, we establish the following theorem:

Theorem 4. Letd ≥ 0,t ≥ 1and` ≥2 be fixed. There is anLNL-transducer that, given any OMQ inOMQ(d,t,`), constructs its polynomial-size linear NDL-rewriting of width≤`(t+ 1).

LetObe an ontology of finite depthdandq(x)a CQ with a tree decomposition (T, λ)of width≤ thaving≤`leaves. Fix one of the nodes ofT as root, and letM be the maximum distance to a leaf from the root. For0 ≤n ≤ M, by ann-slicewe mean the set of all nodes ofT located at distancenfrom the root. Denote byyn the union of all bagsλ(t) for a node t in then-slice. For 1 ≤ n ≤ M, let zn be the union of allλ(t)∩λ(t0)for a nodetin then-slice and its predecessort0inT (which is in(n−1)-slice), and let z0 =∅. By definition,zn+1 ⊆ yn+1∩yn and, clearly,

|zn| ≤ |yn| ≤`(t+ 1). Denote byqn(zn,xn)the query consisting of all atomsS(z) ofqwithz ⊆S

k≥nyk, wherezn =zn\xandxn=x∩S

k≥nyk. These queries and sets of variables for the CQ from Example 1 are shown below:

q2 q1

q0 x0

x1

x1

x2

x2

x3

S R R

y2={x2, x3}z2={x2}x2={x3} y1={x1, x2}z1={x1}x1=∅ y0={x0, x1}z0=∅ x0={x0} Atype forzn is a total mapw fromzntotermO∪{ε}. Likewise, atype foryn is a total mapsfromyntotermO∪{ε}. We sayscompatiblewithyn if it is compatible with every bagλ(t)in then-slice.

Consider the NDL programΠQLIN defined as follows. For every0 ≤ n < M and every typewforzn, we introduce a new IDB predicateGwn(var(w),xn)with param- etersxn. For each typesforynsuch thatsis compatible withynand agrees withw onzn, the programΠQLINcontains the clause

Gwn(var(w),xn)←Ats(var(s),xs)∧Gsn+1zn+1(var(szn+1),xn+1).

For every typewforzM and every typesforyM such thatsis compatible withyM and agrees withwonzM, we include the clause

GwM(var(w),xM)←Ats(var(s),xs).

Finally, we useGε0with parametersxas the goal predicate (note thatz0 =∅, and so the domain of any type forz0is empty).

(10)

Lemma 4. For anyDcomplete forO, any predicateGwn, anyb∈ind(D)|var(w)|+|x≥n|, we haveΠQLIN,D |=Gwn(b)iff there is a homomorphismh:qn→CO,Dsuch that

h(z) =

(b(z), for allz∈xnand allz∈zn withw(z) =ε, w(z)[a/b(a)],e for allz∈zn withw(z)∈termO(P(a)).

It should be clear thatΠQLIN is a linear NDL program of width≤ `(t+ 1)and containing≤ |q| · |termO|`(t+1)predicates. Moreover, it takes only logarithmic space to store a type w, which allows us to show that ΠQLIN can be computed by an LNL- transducer. We apply Lemma 1 to obtain an NDL-rewriting for arbitrary data instances, and then use Theorem 1 to conclude that the resulting program can be evaluated in NL.

7 L

OG

CFL Rewritings for OMQ(∞, 1, `)

Unlike the previous two classes, answering OMQs from the classOMQ(∞,1,`)can be harder—LOGCFL-complete—than evaluating their CQs, which can be done in NL.

Theorem 5. For any fixed`≥2, the classOMQ(∞,1,`)is skinny-reducible.

For OMQs with ontologies of unbounded depth and acyclic CQs whose join trees have a bounded number of leaves, our rewriting uses the notion of Skolem witness that generalises tree witnesses [14].

LetQ(x) = (O,q(x))be an OMQ, lets = (s1r, . . . ,snr,si)be a tuple of disjoint sets of variables inq(x)such thatsi6=∅andsi∩x=∅, and letsr=s1r ∪ · · · ∪snr,

qs =

S(z)∈q|z⊆sr∪siandz 6⊆sr .

Ifqsis a minimal subset ofqcontaining every atom ofqwith at least one variable from siand such that there is a homomorphismh:qs →CO,{P(a)}witha = (a1, . . . , an) andh−1(aj) =sjr for1≤j ≤n, then we callsaSkolem witness forQ(x)generated byP(a). Intuitively,sidentifies a minimal subset ofqthat can be mapped to theSkolem partof the canonical modelCO,{P(a)}consisting of Skolem terms: the variables insr

are mapped to constants fromaand the variables insito Skolem terms intermO(P(a)).

The logarithmic-depth NDL-rewriting forOMQ(∞,1,`)is based on the following:

Lemma 5. Every treeTof sizenhas a node splitting it into subtrees of size≤ dn/2e.

LetQ(x0) = (O,q0(x0))be an OMQ with an acyclic CQ having a join treeT0. We repeatedly apply Lemma 5 to decompose the CQ into smaller and smaller subqueries.

Formally, for an acyclic CQq, we denote byγq a vertex in the join tree T forqthat satisfies the condition of Lemma 5. LetQbe the smallest set containingq0(x0)and the following CQs, for everyq(x)∈Qwith at least one existentially quantified variable:

(1) the CQsqi(xi)corresponding to the connected componentsTiwith rootγqiadja- cent toγqof the result of removingγqfromT, wherexiconsists of the restriction ofxto the variables inqitogether with the common variables ofγqiandγq;

(11)

(2) for each Skolem witness s for (O,q(x))with sr 6= ∅ and γq ∈ qs, the CQs qs1(xs1), . . . ,qsk(xsk) that correspond to the connected components Tis of the re- sults of removingqsfromT (note thatqsis connected inT), where eachxsi is the set of variables inx∪srthat occur inqsi.

The NDL programΠQSWuses IDB predicatesGq(x), forq(x)∈Q, whose parameters are the variables inx0that occur inq(x). For eachq(x)∈Qthat has no existentially quantified variables, we include the clauseGq(x) ← q(x). For anyq(x) ∈ Qwith existential variables, we include

Gq(x) ←γq ∧^

1≤i≤nGqi(xi),

where q1(x1), . . . ,qn(xn) are the subqueries obtained by splittingq by γq in (1), and, for any Skolem witnesssof(O,q(x))withsr 6=∅ andγq ∈ qsand anyP(a) generatings, the clause

Gq(x) ←P(ea) ∧^

z∈sjr(z=eaj) ∧^

1≤i≤kGqsi(xsi),

where qs1, . . . ,qsk are the connected components of q without qs. Finally, if q0 is Boolean, then we includeGq0←P(ea)for all atomsP(a)such thatO,{P(a)} |=q0. Lemma 6. For any OMQ with an acyclic CQ, any dataDcomplete forO, any query q(x)∈Qand anyb∈ind(D)|x|, we haveΠQSW,D |=Gq(b)iff there is a homomor- phismh:q→CO,Dwithh(x) =b.

Now fix` > 1and consider Q(x) = (O,q0(x))from the classOMQ(∞,1,`) (remember that we have fixed arityn). The size of the programΠQSW is polynomially bounded in|Q|sinceq0has polynomially-many subtrees ofTq0 andO(|q0|`)Skolem witnesses (there are at mostO(|q0|`· |Σ| ·nn)pairs of a Skolem witnesssand its gen- erating atomP(a)). It is readily seen that the functionν defined byν(Gq) =|q|, for eachq∈Q, is a weight function for(ΠQSW, Gq

0(x))withν(Gq

0)≤ |Q|. Moreover, by Lemma 5,d(ΠQSW, Gq

0)≤logν(Gq

0)+1; also,w(ΠQSW, Gq

0)≤`+1. Finally, we note that, since the number of leaves is bounded, it is in NL to decide whether a vertex satis- fies the conditions of Lemma 5, and in LOGCFL to decide whetherO,{P(a)} |=q(a), for bounded-leaf acyclic CQsq(x)(see the full version1), or whether a (logspace) rep- resentation of a possible Skolem witness is indeed a Skolem witness. This allows us to show that(ΠQSW, Gq0(x))can be generated by an LLOGCFL-transducer. By Corollary 1, the obtained NDL-rewritings can be evaluated in LOGCFL.

8 Conclusion

We presented NDL rewritings for three classes of OMQs with CQs of bounded (hyper)- tree width and ontologies given as linear TGDs. These NDL rewritings can be con- structed and evaluated in LOGCFL, NL and LOGCFL, respectively (provided that the arity of predicates is bounded). Since the three upper bounds match the lower bounds inherited from the OWL 2 QL setting [4], the proposed rewritings are theoretically op- timal.

1http://www.dcs.bbk.ac.uk/˜kikot/DL17-1-full.pdf

(12)

References

1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Baget, J.-F., Lecl`ere, M., Mugnier, M.-L., Salvat, E.: On rules with existential variables:

Walking the decidability line. Artificial Intelligence 175(9–10), 1620–1654 (2011)

3. Bienvenu, M., Kikot, S., Kontchakov, R., Podolskii, V.V., Ryzhikov, V., Zakharyaschev, M.:

The complexity of ontology-based data access with OWL 2 QL and bounded treewidth queries. In: Proc. of the 26th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS (2017)

4. Bienvenu, M., Kikot, S., Podolskii, V.V.: Tree-like queries in OWL 2 QL: succinctness and complexity results. In: Proc. of the 30th Annual ACM/IEEE Symposium on Logic in Com- puter Science, LICS 2015. pp. 317–328. IEEE Computer Society (2015)

5. Boguslavsky, I., Dikonov, V., Iomdin, L., Lazursky, A., Sizov, V., Timoshenko, S.: Semantic analysis and question answering: a system under development. In: Computational Linguistics and Intellectual Technologies. Papers from the Annual International Conference Dialogue.

p. 21. No. 14 (2015)

6. Cal`ı, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. Journal of Web Semantics 14, 57–83 (2012)

7. Cook, S.A.: Characterizations of pushdown machines in terms of time-bounded computers.

Journal of the ACM 18(1), 4–18 (1971)

8. Cuenca Grau, B., Horrocks, I., Kr¨otzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.:

Acyclicity notions for existential rules and their application to query answering in ontologies.

Journal of Artificial Intelligence Research 47, 741–808 (2013)

9. Gottlob, G., Greco, G., Leone, N., Scarcello, F.: Hypertree decompositions: Questions and answers. In: Proc. of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS (2016)

10. Gottlob, G., Leone, N., Scarcello, F.: Hypertree decompositions and tractable queries. Jour- nal of Computer and System Sciences 64(3), 579–627 (2002)

11. Gottlob, G., Manna, M., Pieris, A.: Polynomial Rewritings for Linear Existential Rules, pp.

2992–2998. In: Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI (2015) 12. Gottlob, G., Orsi, G., Pieris, A.: Query rewriting and optimization for ontological databases.

ACM Transactions on Database Systems 39(3), 25:1–25:46 (2014)

13. Kikot, S., Kontchakov, R., Podolskii, V., Zakharyaschev, M.: On the succinctness of query rewriting over shallow ontologies. In: Proc. of the Joint Meeting of the 23rd EACSL Annual Conf. on Computer Science Logic (CSL 2014) and the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2014). pp. 57:1–57:10. ACM (2014)

14. Kikot, S., Kontchakov, R., Zakharyaschev, M.: Conjunctive query answering with OWL 2 QL. In: Proc. of the 13th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR 2012). pp. 275–285. AAAI (2012)

15. K¨onig, M., Lecl`ere, M., Mugnier, M.-L., Thomazo, M.: Sound, complete and minimal UCQ- rewriting for existential rules. Semantic Web 6(5), 451–475 (2015)

16. K¨onig, M., Leclere, M., Mugnier, M.-L.: Query rewriting for existential rules with compiled preorder. In: Proc. of the 24th Int. Joint Conf. on Artificial Intelligence, IJCAI (2015) 17. Sudborough, I.H.: On the tape complexity of deterministic context-free languages. Journal

of the ACM 25(3), 405–414 (1978)

18. Thomazo, M.: Compact rewriting for existential rules. In: Proc. of the 23rd Int. Joint Conf.

on Artificial Intelligence, IJCAI (2013)

19. Venkateswaran, H.: Properties that characterize LOGCFL. Journal of Computer and System Sciences 43(2), 380–404 (1991)

Références

Documents relatifs

For NDL-rewritings, we show that polynomial-size rewritings always exist for tree-shaped queries with a bounded number of leaves (and arbitrary TBoxes), and for bounded

u t Discussion If we compare the new and existing results for OWL 2 QL with those from relational databases, we observe that adding an OWL 2 QL TBox of bounded depth does not change

The use of hornMTL and datalogMTL ontologies (with both diamond and box operators in rule bodies and only box operators in rule heads) for querying temporal log data was advocated

We prove a number of new syntactic and semantic sufficient and nec- essary conditions for ontology-mediated queries (OMQs) with one covering ax- iom to be in the classes AC 0 and NL

In particular, we construct an ontology T such that answering OMQs (T , q) with tree-shaped CQs q is W[1]- hard if the number of leaves in q is regarded as the parameter.. The number

For example, one such condition guaranteeing datalog rewritability, and so tractability of answering Q follows from [8, Theorem 27]: it suffices that q contains at most one

In fact, we report here a simplified version of the translation in [1], which is a polynomial translation of instance queries mediated by ALCHOI TBoxes with closed predicates

We show that, for OWL 2 QL ontology-mediated queries with i ontologies of bounded depth and conjunctive queries of bounded treewidth, ii ontologies of bounded depth and