• Aucun résultat trouvé

Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering

N/A
N/A
Protected

Academic year: 2022

Partager "Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering"

Copied!
13
0
0

Texte intégral

(1)

Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering

O. Gerasimova1, S. Kikot2, and M. Zakharyaschev3

1 National Research University Higher School of Economics, Moscow, Russia

2 University of Oxford, U.K.

3 Birkbeck, University of London, U.K.

Abstract. We prove a number of new syntactic and semantic sufficient and nec- essary conditions for ontology-mediated queries (OMQs) with one covering ax- iom to be in the classes AC0 and NL for data complexity, and to be L-, NL- or P-hard. We also give two new examples of very simpleCONP-complete OMQs.

1 Introduction

The problem we are concerned with in this paper originated from the observation that the NPD FactPages ontology4, used for testing ontology-based data access (OBDA) in industry [13, 12], contained covering axioms of the formA v B1t · · · tBn. It has been known since Schaerf’s paper [18] that answering ontology-mediated queries (OMQs) with a covering axiom can be CONP-hard for data complexity, and so such OMQs are not suitable for OBDA systems in general. In fact, two interesting OMQs with covering can be extracted from [18]:Q1 = (Cov>,q1)andQ2 = (Cov>,q2), whereCov> ={> v FtT}and the Boolean conjunctive queries (CQs)q1andq2 are shown below.

q1 F T

S R

T T

F F

P1 P2

N1 R N2

q2 Schaerf pointed out that answering these queries involves case analysis and showed that Q2isCONP-hard for data complexity. It is readily seen thatQ1is NL-complete. The problem we started attacking in [9] was to find a complete syntactic or semantic clas- sification of OMQs with covering according to their data complexity. The direct attack failed, and connections with other hard classification problems such as boundedness or linearisability of datalog programs (to be discussed in Section 3) indicate that a longer siege is needed. In this paper, we report on our recent findings.

We consider four ontologies with covering:Cov>as above,CovA={AvFtT}, Cov> ={> vF tT, F uT v ⊥},CovA ={AvFtT, F uT v ⊥}, and we classify connected Boolean CQsqby the number of solitary occurrences ofFandTin them, where asolitaryFis anyF(x)∈qwithT(x)∈/ q, and symmetrically forT. We prove a number of new syntactic and semantic sufficient and necessary conditions for

4http://sws.ifi.uio.no/project/npd-v2/

(2)

such OMQs with at most one solitaryF to be in the complexity classes AC0and NL, and to be L-, NL- or P-hard. We also give two new examples of very simple CONP- complete OMQs.

2 Preliminaries

In this paper, a Boolean conjunctive query (CQ) is any first-order (FO) sentence of the formq =∃xϕ(x), whereϕis a conjunction of unary or binary atomsP(y)with y⊆x. We often regard CQs assetsof their atoms, depict them as labelled digraphs, and assume that all of our CQs areconnected. Byansweringanontology-mediated query (OMQ) Q = (T,q) with a TBox T of the form defined above, we understand the problem of checking, given an ABoxA(a finite set of unary or binary ground atoms), whetherqholds in every model ofT ∪ A, orT,A |=qin symbols. For everyQ, this problem is clearly inCONP. It is in the complexity class AC0if there is an FO-sentence q0, called anFO-rewritingofQ, such thatT,A |=qiffA |=q0, for any ABoxA.

Adatalog program,Π, is a finite set ofrules∀x(γ0←γ1∧ · · · ∧γm), where each γiis an atomQ(y)withy⊆x. (As usual, we omit∀x.) The atomγ0is theheadof the rule, andγ1, . . . , γmitsbody. All the variables in the head must occur in the body. The predicates in the head of rules areIDB predicates, the restEDB predicates[1].

Adatalog query is a pair(Π, G), whereΠ is a datalog program andGan 0-ary atom, thegoal. Theanswerto(Π, G)over an ABoxAis ‘yes’ ifGholds in the FO- structure obtained by closingAunderΠ, in which case we writeΠ,A |=G. A datalog query(Π, G) is adatalog rewritingof an OMQQ = (T,q) in caseT,A |= q iff Π,A |=G, for any ABoxA. Theanswering problemfor(Π, G)—i.e., checking, given an ABoxA, whetherΠ,A |= G—is clearly in P. Answering a datalog query with a linearprogram, whose rules have at most one IDB predicate in the body, can be done in NL. The NL upper bound also holds for datalog queries with a linear-stratified program defined as follows. Astratifiedprogram [1] is a sequenceΠ = (Π0, . . . , Πn)of datalog programs, called thestrataofΠ, such that each predicate inΠ can occur in the head of a rule only in one stratumΠiand can occur in the body of a rule only in strataΠj

withj ≥i. If, additionally, the body of each rule inΠcontains at most one occurrence of a head predicate from the same stratum,Π is calledlinear-stratified. Every linear- stratified program can be converted to an equivalent linear datalog program [2].

3 Connections in High Places

We begin by putting our little problem into the context of more general investigations of(i)boundedness and linearisability of datalog programs and(ii)the data complexity of answering OMQs with expressive ontologies.

The decision problem whether a given datalog program is bounded has been a hot research topic in database theory since the late 1980s. Thus, it was shown that bounded- ness is undecidable already for linear datalog programs with binary IDB predicates [20]

and single rule programs (akasirups) [15]. On the other hand, deciding boundedness is 2EXPTIME-complete formonadicdatalog programs [7, 4] and PSPACE-complete for linear monadic programs [7]; for linear sirups, it is even NP-complete [20].

(3)

The last two results are relevant for deciding FO-rewritability of OMQs(CovA,q), whereq has a single solitaryF and is called a1-CQ. Indeed, suppose thatF(x)and T(y1), . . . , T(yn)are all the solitary occurrences ofFandTinq. LetΠqbe a monadic datalog program with three rules

G←F(x),q0, P(y1), . . . , P(yn), (1)

P(x)←T(x), (2)

P(x)←A(x),q0, P(y1), . . . , P(yn), (3) whereq0=q\ {F(x), T(y1), . . . , T(yn)}andPis a fresh predicate symbol that never occurs in our ABoxes. Then, for any ABoxA, we haveCovA,A |=qiffΠq,A |=G.

Thus, FO-rewritability of(CovA,q)is clearly related to boundedness of the sirup (3).

The problem of linearising datalog programs, that is, transforming them into equiv- alent linear datalog programs, which are known to be in NL for data complexity, has also attracted much attention [16, 17, 21, 2] after the Ullman and van Gelder pioneering paper [19].

The success of the OBDA paradigm spurred considerable interest in the data com- plexity of answering individual OMQs with expressive ontologies. Thus, by establish- ing a remarkable connection to CSPs, it was shown in [5] that deciding FO-rewritability and datalog rewritability of OMQs with a SHIU ontology and a (Boolean) atomic query is NEXPTIME-complete. This result is obviously applicable to (CovA,q)with atree-shapedq. In [8], it was shown that checking rewritability of Boolean monadic disjunctivedatalog programs into FO and monadic datalog can be done in, respectively, 2NEXPTIMEand 3EXPTIME, which is applicable to all of our OMQs(CovA,q).

An AC0/NL/P trichotomy for the data complexity of answering OMQs with anEL ontology and atomic query, which can be checked in EXPTIME, was established in [14].

This result is applicable to OMQs(CovA,q), in whichqis anF-treehaving a single solitaryF(x)such that the binary atoms inqform a ditree with rootx. Indeed, denote byTQtheELTBox with concept inclusionsFuCq vG0,T vP andAuCq vP, whereCqis anEL-concept representingq\ {F(x)}withPforT(so forqof the form

F x

F T

y1 y2

T y3

R1 R2 R3

Cq =∃R1.(FuPu ∃R2.∃R3.P)). Then, for any ABoxAthat does not containG0, we haveΠQ,A |=GiffTQ,A |=∃x G0(x).

Yet, despite all of these efforts and results (implying, in view of the recent positive solution to the Feder-Vardi conjecture [6, 22], that there is a P/CONP dichotomy for OMQs with a SHIU ontology and a (Boolean) atomic query, which is decidable in NEXPTIME), we are still lacking simple and transparent, in particular syntactic, con- ditions guaranteeing this or that data complexity or type of rewritability. Some results in this direction were obtained in [10, 11]. That a transparent classification of monadic sirups according to their data complexity has not been found so far and the close con- nection to CSPs indicate that this problem is extremely hard in general. Our aim below is to demonstrate that, for the OMQs with covering considered in this paper, syntactic conditions do exist but are difficult to find.

(4)

4 0-CQs

As observed in [9], if an OMQQ(of the form defined above) is in AC0, thenqis an FO-rewriting ofQ. By a0-CQwe mean any CQ that does not contain a solitaryF. A twinin a CQqis any pairF(x), T(x)∈q. Here is an encouraging syntactic criterion of FO-rewritability for OMQs of the form(CovA,q):

Theorem 1. (i) Ifq is a0-CQ, then answering both(CovA,q)and (CovA,q)is in AC0, withqbeing an FO-rewriting of these OMQs.

(ii)Ifqis not a0-CQ and does not contain twins, then answering both(Cov>,q) and(Cov>,q)isL-hard.

Proof. (i)follows from [9, Theorem 1].(ii)The proof is by reduction to the reachabil- ity problem for undirected graphs, which is known to be L-complete; see, e.g., [3].

Denote by q0 the CQ obtained from q by gluing together all the variables x with F(x) ∈ qand also all the variablesy withT(y) ∈ q. Thus,q0 contains a singleF- atom,F(x), and a singleT-atom,T(y). Clearly, there is a homomorphismh:q→q0. Letq00=q0\ {F(x), T(y)}.

Suppose we are given an undirected graphG= (V, E)and two verticess, t∈V. We regardGas a directed graph such that(u, v)∈Eiff(v, u)∈E, for anyu, v ∈V. Now, we encodeGby means of an ABoxAGthat is obtained fromGas follows. For every edgee= (u, v)∈E, letq00e be the set of atoms inq00withxrenamed tou,yto vand all other variablesztoze. ThenAGcomprises all suchq00e, fore∈E, as well as F(s)andT(t). We show thats→GtiffCov>,AG |=q.

Suppose thats→G t, i.e., there exists a paths=v0, . . . , vn = tinGwithei = (vi, vi+1) ∈ E, fori < n. Consider an arbitrary modelI of Cov> andAG. Since I |=Cov>, andF(s)andT(t)are inAG, we can find somei < nsuch thatI |=F(vi) andI |= T(vi+1). Asq00e

i is an isomorphic copy ofq00, we obtain I |= q0, and so I |=q. Conversely, supposes6→G t. Define an interpretationI, extending the ABox AG, by settingFI to be the set of objects inAGthat are reachable fromsandTI its complement. Clearly,I is a model ofCov>. By the construction, the elements of the connected component ofIcontainingscannot be instances ofT, while the remaining elements ofIare not be instances ofF. Sinceqis connected, it follows thatI 6|=q.

Corollary 1. An OMQ(CovA,q)is inAC0iffqis a0-CQ, which can be decided in linear time.

If twins can occur in CQs (that is,F andT are not necessarily disjoint), the pic- ture becomes more complex. On one hand, we have the following criterion for OMQs (CovA,q)with apath CQqwhose variablesx0, . . . , xn inqare ordered so that the binary atoms inqform a chainR1(x0, x1), . . . , Rn(xn−1, xn).

Theorem 2 ([9]).An OMQ(CovA,q)with a path CQqis inAC0iffqis a0-CQ. Ifq contains both solitaryF andT, then(CovA,q)isNL-hard.

On the other hand, this AC0/NL criterion collapses for path CQs with loops:

Proposition 1. The OMQ(CovA,q), whereqis shown below, is inAC0.

(5)

F T R

T F F T

R R S S S

Proof. Follows from the sufficient condition of Theorem 4(i).

Note that the CQqabove isminimal(not equivalent to any of its proper sub-CQs).

Note also that, if a minimal 1-CQqcontains both a solitaryF and a solitaryT, then FO-rewritability of(CovA,q)implies thatqcontains at least one twin (F T) and at least oneywithT(y)∈/ qandF(y)∈/q(which can be shown using Theorem 4(i)).

5 1-CQs

1-CQs have exactly one solitaryF. As observed in Section 3, we have the following:

Theorem 3. (i)Answering any OMQ(CovA,q)with a1-CQqcan be done inP.

(ii)Answering any OMQ (CovA,q)with an F-tree q is either in AC0 or NL- complete orP-complete. The trichotomy can be decided inEXPTIME.

Theorem 3(ii)was proved by a reduction to the AC0/NL/P-trichotomy of [14].

It is to be noted, however, that applying the algorithm from [14] in our case is tricky because the input ontologyTQmust first be converted to a normal form. As a result, we do not obtain transparent syntactic criteria on the shape ofqthat would guarantee that the OMQ(CovA,q)belongs to the desired complexity class (see examples below).

We now give a semantic sufficient condition for an OMQ with a 1-CQ to lie in NL.

This condition uses ideas and constructions from [7, 14]. Let Q = (CovA,q)be an OMQ with a 1-CQqhaving a solitaryF(x). Define by induction a classKQof ABoxes that will be calledcactuses forQ. We start by settingKQ := {q}, regardingq as an ABox, and then recursively apply toKQthe following two rules:

(bud) ifT(y)∈ A ∈KQwith solitaryT(y), then we add toKQthe ABox obtained by replacingT(y)inAwith(q\F(x))∪ {A(x)}, in whichxis renamed toyand all of the other variables are givenfreshnames;

(prune) ifCovA,A0 |=Q, whereA0=A \ {T(y)}andT(y)is solitary, we add toKQ

the ABox obtained by removingT(y)fromA ∈KQ.

It is readily seen that, for any ABoxA0, we haveCovA,A0 |=Qiff there existA ∈KQ and a homomorphismh:A → A0. Denote byKQ the set of minimal cactuses inKQ (that have no proper sub-cactuses inKQ).

For a cactusC ∈KQ, we refer to the copies of (maximal subsets of)qthat comprise C assegments. TheskeletonCsof C is the ditree whose nodes are the segmentssof C and edges(s,s0)mean thats0 was attached tosby budding. The atoms T(y) ∈ s are called thebudsofs. Therankr(s)ofsis defined by induction: ifsis a leaf, then r(s) = 0; for non-leafs, we compute the maximal rankmof its children and then set

r(s) =

(m+ 1, ifshas≥2children of rankm;

m, otherwise.

(6)

ThewidthofCandCsis the rank of the root inCs. We say thatKQ is ofwidthkif it contains a cactus of widthkbut no cactus of greater width. ThedepthofCandCsis the number of edges in the longest branch inCs.

We illustrate the definition by an example. Denote byqT nT, forn≥0, the 1-CQ shown below, where all the binary predicates areRand thenvariables without labels do not occur inF- orT-atoms:

F T T

n . . .

Example 1. LetQ= (Cov>,qT1T). In the picture below, we show a cactusCobtained by applying(bud)twice toqT1T (withA=>omitted):

F T

z

T

T T

One can check thatCov>,C \ {T(z)} |=qT1T, and so an application of(prune)will removeT(z)fromC. Using this observation, one can show thatKQ is of width 1. On the other hand, ifQ= (CovA,qT1T)thenKQis of unbounded width as follows from Theorem 6 below.

Theorem 4. LetQ= (CovA,q)be an OMQ with a 1-CQq. Then

(i)Qis inAC0iff for everyC ∈KQ, there is a homomorphismh:q→ C;

(ii)Qis rewritable in linear datalog, and so is inNL, ifKQis of bounded width.

Proof. The proof of(i)is straightforward. To prove(ii), we represent the ABoxes in KQas words in a tree alphabet and construct a non-deterministic finite tree automaton AQ such thatKQ ⊆ L(AQ) ⊆ KQ. Then we show that, forKQ of finite width, the automatonAQcan be transformed into a monadic linear-stratified datalog rewriting of Q. It can further be converted into a linear datalog rewriting at the expense of increasing the arity if IDBs in the program [2].

It is worth noting that, forQ= (Cov>,q)withqfrom Proposition 1,KQconsists of qand the cactus of depth 1, in which the only solitaryTis removed by(prune). Clearly, there is a homomorphism fromqinto this cactus, and soQis FO-rewritable. However, for the 1-CQqin the picture below (where all edges are bidirectional),(Cov>,q)is not FO-rewritable, but there is a homomorphism fromq to both cactuses of depth 1. We do not know whether, in general, there is an upper boundNqsuch that the existence of homomorphismsh:q→ C, for allC ∈KQof depthNq, would ensure FO-rewritability of(CovA,q). For 1-CQsqwith a single solitaryT, one can takeNq =|q|+ 1. Neither do we know the exact complexity of deciding FO-rewritability of OMQs with 1-CQs.

(7)

As mentioned in Section 3, this problem is reducible to the boundedness problem for monadic datalog programs, which is known to be in 2EXPTIME.

T

F T F T

T

F T F T

F

S S S R Q R

Q

S R, Q R, Q

S S R Q R Q

R

S R, Q R, Q

S

Theorem 4(ii)allows us to obtain a sufficient condition for linear-datalog rewritabil- ity of OMQs(CovA,q)with anF-path CQq, that is, a path CQ with a single solitary F at its root. We represent such aqas shown in the picture below, which indicatesall the solitary occurrences ofFandT:

q =

F x

T y1

T yi

T

ym ym+1

. . . . . . . . . . . .

We require the following sub-CQs ofq:

– qiis the suffix ofqthat starts atyi, but withoutT(yi), for1≤i≤m;

– qi is the prefix ofqthat ends atyi, but withoutF(x)andT(yi), for1≤i≤m;

– qm+1isqwithoutF(x),

and writefi:qiqiffiis a homomorphism fromqiintoqwithfi(yi) =x.

Theorem 5. If for each1≤i≤mthere existfi: qiq, then(CovA,q)is rewritable into a linear datalog program, and so isNL-complete.

ForF-path CQsqwithout twins, we extend Theorem 5 to a NL/P dichotomy (pro- vided that NL 6= P). Given such a CQq, we denote byNq the set of the numbers indicating the length of the path fromxto each of theyi,i= 1, . . . , m+ 1.

Theorem 6. LetQ= (CovA,q)be an OMQ whereqis anF-path CQ without twins having a single binary relation. The following are equivalent unlessNL=P:

(i) QisNL-complete;

(ii) {0} ∪Nqis an arithmetic progression;

(iii) there existfi:qiqfor everyi= 1, . . . , m.

If these conditions do not hold, thenQisP-complete.

Proof. It is readily seen, using Theorem 5, that (ii) ⇒ (iii) ⇒ (i). We prove the implication(i)⇒(ii). Suppose{0} ∪Nq is not an arithmetic progression. We show thatQis P-hard by reduction of the monotone circuit evaluation problem. LetXbe the sub-CQ ofqbetweenF(x)and the firstT but without theF andT atoms, and let|X| be the number of atoms inX. LetY be the first sub-CQ between neighbouringT-atoms without theseT-atoms such that|Y| 6=|X|,nthe number ofX-sub-CQs beforeY, and letZ be the rest of the query without the first T atom; see the picture below where

(8)

n= 2.

F x0

T x1

T x2

T x3 x4

X X Y Z

We distinguish between two cases:|Y|>|X|and|Y|<|X|. Depending on the case, we use the following gadgets for∧-gates, which are shown below forn= 2(it should be clear how to modify the construction forn >2); the∨-gate gadget is the same for both cases.

∧-gate for|Y|>|X| c A

A a0

T

T

a A T A b0

T

T

b A

X

X

X

YZ X

X

Z Y

X Y

Z ∧-gate for|X|>|Y|

A c

T

A b

A a

X

X Y Z

∨-gate c

T

T

a

A

T

T

b

A

X

X

Y Z

X

X

Y Z

Given any monotone Boolean circuitC and its inputα, we construct an ABoxAC,α

by replacing∧- and∨-gates with the inputsaandband outputcby the gadgets above, placingTon the input gates evaluated to 1 underαandFon the output gate. We claim thatCovA,AC,α|=qiffC(α) = 1.

(⇐) It is not hard to show by induction that whenever a non-input gate g inC outputs 1 underα, then the sub-ABox ofAC,α generated bygand withF placed on g validatesQ. For example, supposeT(a),T(b)andF(c)hold in the left-hand side gadget. Then eitherF(b0)orT(b0). In the former case,qis satisfied; so assumeT(b0).

We have eitherF(a0)orT(a0). Now, in either case,qis satisfied, as required.

(⇒) Suppose C(α) = 0. We construct a model I of CovA based onAC,α by placingT orF togandg0 (if available) depending on the value ofginCunderα. It can be checked thatI 6|=q. In particular, if|Y|>|X|,cis the conjunction ofbanda, c=b= 0anda= 1, then ifx0goes toc,x1goes toa0, thenx2must go to the point below a0, and sox3 must go somewhere between aand the point abovea, which is impossible. Ifa= 0, b= 1andx0goes toa0andx1goes to the centralT, thenx2must go somewhere between the centralT andb0, which is also impossible. If|Y| <|X|, c = b∧a,b =d∨e,a =c = 0,b = 1, then ifx0goes toc,x2goes tob, thenx3 cannot go toaasa= 0, and so it must go into someX-segment of the gateb=d∨e going out ofb, which is again impossible.

Note that the proof of P-hardness in Theorem 6 does not go through forA = >.

Thus, for(Cov>,qT1T), we are in the framework of Example 1 and, by Theorem 4(ii), this OMQ is in NL. In fact, we have the following NL/P dichotomy for the OMQs of the form(Cov>,qT nT).

Theorem 7. (i)The OMQ(Cov>,qT1T)isNL-complete(whereas(CovA,qT1T)is P-complete).

(9)

(ii)The OMQs(Cov>,qT nT) (and(CovA,qT nT)), forn≥2, areP-complete.

Proof. The proof is similar to that of P-hardness in Theorem 6.

We now apply Theorem 4(ii)to the class ofT F-path CQsof the form

qT F =

T y0

F x

T y1

T

ym ym+1

. . . . . . . . . . . .

where the T(yi)and F(x)are all the solitary occurrences of T andF inqT F. We represent this CQ as

qT F ={T(y0)} ∪q0∪q,

whereq0 is the sub-CQ ofqT F betweeny0andxwithT(y0)removed andq is the same as in Theorem 5 (andqm+1isqwithoutF(x)).

Theorem 8. If q satisfies the condition of Theorem 5 and there is a homomorphism h: qm+1→q0such thath(x) =y0, then answering(CovA,qT F)isNL-complete.

For example, the OMQ(CovA,q)withqshown below is NL-complete:

T F T F T

On the other hand, as shown in [9],(CovA,q)withqof the form

T F T

is P-complete; in fact, it follows from the proof that(Cov>,q)is P-complete, too.

6 2-CQs

A2-CQhas at least two solitaryF and at least two solitaryT. For example, as shown in [9], answering(CovA,q)with the following CQqisCONP-complete:

T T F F

The proof can be generalised to the class of 2-2-CQs, which are path 2-CQs where all the F are located after all theT, and every occurrence ofT or F is solitary. We represent any given 2-2-CQqas shown below

T x

T y

F z

F

p r s u w v

wherep,r,uandvdo not containFandT, whilesmay contain solitary occurrences of bothTandF(in other words, theTshown in the picture are the first two occurrences ofT inqand theFare the last two occurrences ofFinq). Denote byqr the suffix of qthat starts fromxbut withoutT(x); similarly,quis the suffix ofqstarting fromzbut withoutF(z). Denote byqr the prefix ofqthat ends atybut withoutT(y); similarly, qu is the prefix ofqending atwbut withoutF(w). Using the construction from [9], one can show the following:

(10)

Theorem 9. Any OMQ (CovA,q) with a 2-2-CQq is CONP-complete provided the following conditions are satisfied:(i)there is no homomorphismh1:qu → qr with h1(z) =x, and(ii)there is no homomorphismh2: qr →qu withh2(y) =w.

Unfortunately, the proof does not generalise to the other two path 2-CQs with four variables, whose complexity has so far been open:

q1 T F T F

q2 T F F T

Theorem 10. The OMQsQ1= (CovA,q1)andQ2= (CovA,q2)areCONP-complete.

Proof. We proveCONP-hardness ofQ1by reduction of the NP-complete 3SAT. Given a 3CNFψ, we construct an ABoxAψ as follows. First, for every variablepoccurring inψ, we take the followingp-gadget, wherenis the number of clauses inψ:

A A A

a1 A

. . . A

an A

F

T

A A

cn A

. . . A

c1 A A F T

T F

T F

T F

T F

T F

F

T F

T F

T F

T F

T

A A

b1 A . . .

A

bn

A A

T

F

A A A

dn A . . . A

d1 A T F T F

T F

T F

T F

T F

F

T F

T F

T F

T F

T F

T

p ¬p

The key property of thep-gadget is that, for any modelIofCovAbased on this gadget, ifI 6|= q1 then either theA-points on left-hand side of the gadget are all inTI and theA-points on the right-hand side are all inFI, or the other way round. We refer to theai andbi as p-and¬p-contacts, and to theci anddi asp-and¬p-contacts, respectively.

Now, for every clausec= (l1∨l2∨l3)inψ, we add to the constructed gadgets for the variables inψthe atomsR(uc¬l

1, vcl

2),R(vlc

2, c),T(c),R(c, wlc

3), wherecis a new individual,uc¬l

1 a fresh¬l1-contact,vlc

2 a freshl2-contact, andwlc

3 a freshl3-contact.

For example, for the clausec= (p∨q∨ ¬r), we obtain the fragment below:

A bi

A cj

T c

A dk

A

¬p

A A

q

A A

¬r

A

The resulting ABoxAψis such thatψis satisfiable iffCovA,Aψ6|=q1.

For the OMQQ2, we use (simplified)p-gadgets presented in Theorem 11 and con- nect them similarly to the picture above withT replaced byFatc.

We now sketch two generalisation of Theorem 10.

(11)

In Theorem 11 and 12, we assume thatp andv do not contain F andT, while randumay only contain solitary occurrences ofT (F 6∈ r,u), andsonly solitary occurrences ofF (T 6∈s).

Theorem 11. Any OMQ(CovA,q)withqof the form

T x

F y

F z

T

p r s u w v

isCONP-complete provided the following conditions are satisfied:(i)there is no ho- momorphism h1: rt → u with h1(y) = w, and (ii) there is no homomorphism h2: ut → rwithh2(z) = x, whereris the sub-CQ ofq betweenxandy without T(x),F(y), and similarly foru,rtisrwithT(x)andutisuwithT(w).

The proof uses the followingp-gadget:

A A A

a1 A . . .

A

an A

F

T A A A

cn A . . . A

c1 A F

T F

T

F T

F T

F T

F T

F

T F

T F

T F

T F

T

r r r r r

u

s

u u u u u u s p

r

p

p

p r

v p v

p p p p

s u

s u

s u

s u

s u

v v v v v

v v v v

r s

r s

r s

r s

r s

p p p p p

A A

b1 A . . .

A

bn

A A

F T

A A

cn A . . . A

c1 A A F

T F T

F T

F T

F T

F T

p p p p

F

T F

T F

T F

T F

T

r r r r r

s

r

u u u u u u r

v u

v

p

p u

v

v p

s u

s u

s u

s u

s u

v v v v v

v v v v

r s

r s

r s

r s

r s

p p p p p

F F

r s

u

s

p ¬p

If one of the two conditions of this theorem is not satisfied then the given gadget will not work; see the shaded parts. For example, the complexity of the OMQ with the CQ below is still unknown:

T x

F y

F F

z

T T T

w

In Theorem 12, we userext = r(x, y)∧T(y)∧s1(y, y1)∧F(y1), wheres1is the part ofssuch thats(y, z) =s1(y, y1)∧F(y1)∧s2(y1, z)ands1(y, y1)does not contain any occurrences ofF. In other words, the variabley1 corresponds to the first appearance ofFins, wheresis the sub-CQ ofqbetweenyandzwithoutF(y),T(z).

Theorem 12. Any OMQ(CovA,q)withqof the form

T x

F y

T z

F

p r s u w v

isCONP-complete provided the following conditions hold:(i)there is no homomor- phismg1:rt →uwithg1(y) =w, and(ii)there is no homomorphismg2:u→rext withg2(z) =xandg2(w) =y1.

The CQ below does not satisfy(ii), and its complexity is unknown:

T x

T F

y

F F T

z

T T F

w

It is also unknown whether there are OMQs with 2CQs in P.

(12)

Acknowledgements. The work of O. Gerasimova and M. Zakharyaschev was carried out at the National Research University Higher School of Economics and supported by the Russian Science Foundation under grant 17-11-01294.

References

1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Afrati, F.N., Gergatsoulis, M., Toni, F.: Linearisability on datalog programs. Theor. Comput.

Sci. 308(1-3), 199–226 (2003), https://doi.org/10.1016/S0304-3975(02) 00730-2

3. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge Univer- sity Press, New York, NY, USA, 1st edn. (2009)

4. Benedikt, M., ten Cate, B., Colcombet, T., Vanden Boom, M.: The complexity of bound- edness for guarded logics. In: 30th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10, 2015. pp. 293–304. IEEE Computer Society (2015),https://doi.org/10.1109/LICS.2015.36

5. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Transactions on Database Systems 39(4), 33:1–44 (2014)

6. Bulatov, A.A.: A dichotomy theorem for nonuniform csps. In: Umans, C. (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017. pp. 319–330. IEEE Computer Society (2017),https://doi.org/

10.1109/FOCS.2017.37

7. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization prob- lems for database logic programs (preliminary report). In: STOC. pp. 477–490 (1988) 8. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, mmsnp, and

expressive description logics (invited talk). In: Benedikt, M., Orsi, G. (eds.) 20th Inter- national Conference on Database Theory, ICDT 2017, March 21-24, 2017, Venice, Italy.

LIPIcs, vol. 68, pp. 1:1–1:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017), https://doi.org/10.4230/LIPIcs.ICDT.2017.1

9. Gerasimova, O., Kikot, S., Podolskii, V., Zakharyaschev, M.: On the data complexity of ontology-mediated queries with a covering axiom. In: Proceedings of the 30th International Workshop on Description Logics (2017)

10. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: Calvanese, D., Konev, B. (eds.) Proceedings of the 28th International Workshop on Description Logics, Athens,Greece, June 7-10, 2015. CEUR Workshop Proceedings, vol. 1350. CEUR-WS.org (2015),http://ceur-ws.org/Vol-1350/paper-24.pdf

11. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs and non-Horn ontologies. Artif. Intell. 236, 90–118 (2016),http://dx.doi.org/10.

1016/j.artint.2016.03.006

12. Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G., Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., Giese, M., Lie, H., Ioannidis, Y.E., Kotidis, Y., Koubarakis, M., Waaler, A.: Ontology based data access in statoil. J. Web Sem. 44, 3–36 (2017),https://doi.org/10.1016/j.websem.2017.05.005

13. Lanti, D., Rezk, M., Xiao, G., Calvanese, D.: The NPD benchmark: Reality check for OBDA systems. In: Alonso, G., Geerts, F., Popa, L., Barceló, P., Teubner, J., Ugarte, M., den Buss- che, J.V., Paredaens, J. (eds.) Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015. pp. 617–628.

OpenProceedings.org (2015),https://doi.org/10.5441/002/edbt.2015.62

(13)

14. Lutz, C., Sabellek, L.: Ontology-mediated querying with the description logic EL: tri- chotomy and linear datalog rewritability. In: Sierra, C. (ed.) Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 1181–1187. ijcai.org (2017),https://doi.org/10.24963/

ijcai.2017/164

15. Marcinkowski, J.: DATALOG sirups uniform boundedness is undecidable. In: Proceedings, 11th Annual IEEE Symposium on Logic in Computer Science, New Brunswick, New Jersey, USA, July 27-30, 1996. pp. 13–24. IEEE Computer Society (1996),https://doi.org/

10.1109/LICS.1996.561299

16. Ramakrishnan, R., Sagiv, Y., Ullman, J.D., Vardi, M.Y.: Proof-tree transformation theorems and their applications. In: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART sym- posium on Principles of database systems. pp. 172–181. ACM (1989)

17. Saraiya, Y.P.: Linearizing nonlinear recursions in polynomial time. In: Silberschatz, A. (ed.) Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 29-31, 1989, Philadelphia, Pennsylvania, USA. pp. 182–189.

ACM Press (1989),http://doi.acm.org/10.1145/73721.73740

18. Schaerf, A.: On the complexity of the instance checking problem in concept languages with existential quantification. J. of Intelligent Information Systems 2, 265–278 (1993)

19. Ullman, J.D., Gelder, A.V.: Parallel complexity of logical query programs. Algorithmica 3, 5–42 (1988),https://doi.org/10.1007/BF01762108

20. Vardi, M.Y.: Decidability and undecidability results for boundedness of linear recursive queries. In: Edmondson-Yurkanan, C., Yannakakis, M. (eds.) Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 21-23, 1988, Austin, Texas, USA. pp. 341–351. ACM (1988),http://doi.acm.org/

10.1145/308386.308470

21. Zhang, W., Yu, C.T., Troy, D.: Necessary and sufficient conditions to linearize double re- cursive programs in logic databases. ACM Trans. Database Syst. 15(3), 459–482 (1990), http://doi.acm.org/10.1145/88636.89237

22. Zhuk, D.: A proof of CSP dichotomy conjecture. In: Umans, C. (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, Octo- ber 15-17, 2017. pp. 331–342. IEEE Computer Society (2017),https://doi.org/10.

1109/FOCS.2017.38

Références

Documents relatifs

For example, IQ-rewritability means rewritability of an OMQ from (ALCI, UCQ) into an OMQ from (ALCI , IQ) with the IQ formu- lated in ALCI, and PUBGP-rewritability means

The use of hornMTL and datalogMTL ontologies (with both diamond and box operators in rule bodies and only box operators in rule heads) for querying temporal log data was advocated

This work addresses this problem by introducing a temporal query language that extends a well-investigated temporal query language with probability operators, and investigating

I We first focus on the simpler OMQ language based on guarded TGDs and atomic queries, and, in Section 2, we provide a characterization of FO-rewritability that forms the basis

In particular, we construct an ontology T such that answering OMQs (T , q) with tree-shaped CQs q is W[1]- hard if the number of leaves in q is regarded as the parameter.. The number

For example, one such condition guaranteeing datalog rewritability, and so tractability of answering Q follows from [8, Theorem 27]: it suffices that q contains at most one

As for ordinary databases, they employ the closed-world assumption, i.e., missing facts are treated as being false (having the probability 0), which leads to unintu- itive results

Given an ontology and a query, the tool re- ports di↵erent kinds of programming errors: wrong use of vocabulary, wrong use of resources and literals, wrong filter conditions and