Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering

(1)

Towards a Data Complexity Classification of Ontology-Mediated Queries with Covering

O. Gerasimova¹, S. Kikot², and M. Zakharyaschev³

1 National Research University Higher School of Economics, Moscow, Russia

2 University of Oxford, U.K.

3 Birkbeck, University of London, U.K.

Abstract. We prove a number of new syntactic and semantic sufficient and necessary conditions for ontology-mediated queries (OMQs) with one covering axiom to be in the classes AC⁰ and NL for data complexity, and to be L-, NL- or P-hard. We also give two new examples of very simpleCONP-complete OMQs.

1 Introduction

The problem we are concerned with in this paper originated from the observation that the NPD FactPages ontology⁴, used for testing ontology-based data access (OBDA) in industry [13, 12], contained covering axioms of the formA v B1t · · · tBn. It has been known since Schaerf’s paper [18] that answering ontology-mediated queries (OMQs) with a covering axiom can be CONP-hard for data complexity, and so such OMQs are not suitable for OBDA systems in general. In fact, two interesting OMQs with covering can be extracted from [18]:Q₁ = (Cov>,q₁)andQ₂ = (Cov>,q₂), whereCov> ={> v FtT}and the Boolean conjunctive queries (CQs)q₁andq₂ are shown below.

q₁ ^F ^T

S R

T T

F F

P1 P2

N₁ R N₂

q₂ Schaerf pointed out that answering these queries involves case analysis and showed that Q₂isCONP-hard for data complexity. It is readily seen thatQ₁is NL-complete. The problem we started attacking in [9] was to find a complete syntactic or semantic classification of OMQs with covering according to their data complexity. The direct attack failed, and connections with other hard classification problems such as boundedness or linearisability of datalog programs (to be discussed in Section 3) indicate that a longer siege is needed. In this paper, we report on our recent findings.

We consider four ontologies with covering:Cov>as above,CovA={AvFtT}, Cov^⊥_> ={> vF tT, F uT v ⊥},Cov^⊥_A ={AvFtT, F uT v ⊥}, and we classify connected Boolean CQsqby the number of solitary occurrences ofFandTin them, where asolitaryFis anyF(x)∈qwithT(x)∈/ q, and symmetrically forT. We prove a number of new syntactic and semantic sufficient and necessary conditions for

4http://sws.ifi.uio.no/project/npd-v2/

(2)

such OMQs with at most one solitaryF to be in the complexity classes AC⁰and NL, and to be L-, NL- or P-hard. We also give two new examples of very simple CONP- complete OMQs.

2 Preliminaries

In this paper, a Boolean conjunctive query (CQ) is any first-order (FO) sentence of the formq =∃xϕ(x), whereϕis a conjunction of unary or binary atomsP(y)with y⊆x. We often regard CQs assetsof their atoms, depict them as labelled digraphs, and assume that all of our CQs areconnected. Byansweringanontology-mediated query (OMQ) Q = (T,q) with a TBox T of the form defined above, we understand the problem of checking, given an ABoxA(a finite set of unary or binary ground atoms), whetherqholds in every model ofT ∪ A, orT,A |=qin symbols. For everyQ, this problem is clearly inCONP. It is in the complexity class AC⁰if there is an FO-sentence q⁰, called anFO-rewritingofQ, such thatT,A |=qiffA |=q⁰, for any ABoxA.

Adatalog program,Π, is a finite set ofrules∀x(γ0←γ1∧ · · · ∧γm), where each γiis an atomQ(y)withy⊆x. (As usual, we omit∀x.) The atomγ0is theheadof the rule, andγ1, . . . , γmitsbody. All the variables in the head must occur in the body. The predicates in the head of rules areIDB predicates, the restEDB predicates[1].

Adatalog query is a pair(Π, G), whereΠ is a datalog program andGan 0-ary atom, thegoal. Theanswerto(Π, G)over an ABoxAis ‘yes’ ifGholds in the FO- structure obtained by closingAunderΠ, in which case we writeΠ,A |=G. A datalog query(Π, G) is adatalog rewritingof an OMQQ = (T,q) in caseT,A |= q iff Π,A |=G, for any ABoxA. Theanswering problemfor(Π, G)—i.e., checking, given an ABoxA, whetherΠ,A |= G—is clearly in P. Answering a datalog query with a linearprogram, whose rules have at most one IDB predicate in the body, can be done in NL. The NL upper bound also holds for datalog queries with a linear-stratified program defined as follows. Astratifiedprogram [1] is a sequenceΠ = (Π0, . . . , Πn)of datalog programs, called thestrataofΠ, such that each predicate inΠ can occur in the head of a rule only in one stratumΠiand can occur in the body of a rule only in strataΠj

withj ≥i. If, additionally, the body of each rule inΠcontains at most one occurrence of a head predicate from the same stratum,Π is calledlinear-stratified. Every linear- stratified program can be converted to an equivalent linear datalog program [2].

3 Connections in High Places

We begin by putting our little problem into the context of more general investigations of(i)boundedness and linearisability of datalog programs and(ii)the data complexity of answering OMQs with expressive ontologies.

The decision problem whether a given datalog program is bounded has been a hot research topic in database theory since the late 1980s. Thus, it was shown that boundedness is undecidable already for linear datalog programs with binary IDB predicates [20]

and single rule programs (akasirups) [15]. On the other hand, deciding boundedness is 2EXPTIME-complete formonadicdatalog programs [7, 4] and PSPACE-complete for linear monadic programs [7]; for linear sirups, it is even NP-complete [20].

(3)

The last two results are relevant for deciding FO-rewritability of OMQs(Cov_A,q), whereq has a single solitaryF and is called a1-CQ. Indeed, suppose thatF(x)and T(y₁), . . . , T(y_n)are all the solitary occurrences ofFandTinq. LetΠ_qbe a monadic datalog program with three rules

G←F(x),q⁰, P(y1), . . . , P(yn), (1)

P(x)←T(x), (2)

P(x)←A(x),q⁰, P(y1), . . . , P(yn), (3) whereq⁰=q\ {F(x), T(y1), . . . , T(yn)}andPis a fresh predicate symbol that never occurs in our ABoxes. Then, for any ABoxA, we haveCovA,A |=qiffΠq,A |=G.

Thus, FO-rewritability of(CovA,q)is clearly related to boundedness of the sirup (3).

The problem of linearising datalog programs, that is, transforming them into equivalent linear datalog programs, which are known to be in NL for data complexity, has also attracted much attention [16, 17, 21, 2] after the Ullman and van Gelder pioneering paper [19].

The success of the OBDA paradigm spurred considerable interest in the data complexity of answering individual OMQs with expressive ontologies. Thus, by establish- ing a remarkable connection to CSPs, it was shown in [5] that deciding FO-rewritability and datalog rewritability of OMQs with a SHIU ontology and a (Boolean) atomic query is NEXPTIME-complete. This result is obviously applicable to (CovA,q)with atree-shapedq. In [8], it was shown that checking rewritability of Boolean monadic disjunctivedatalog programs into FO and monadic datalog can be done in, respectively, 2NEXPTIMEand 3EXPTIME, which is applicable to all of our OMQs(CovA,q).

An AC⁰/NL/P trichotomy for the data complexity of answering OMQs with anEL ontology and atomic query, which can be checked in EXPTIME, was established in [14].

This result is applicable to OMQs(CovA,q), in whichqis anF-treehaving a single solitaryF(x)such that the binary atoms inqform a ditree with rootx. Indeed, denote byTQtheELTBox with concept inclusionsFuCq vG⁰,T vP andAuCq vP, whereCqis anEL-concept representingq\ {F(x)}withPforT(so forqof the form

F x

F T

y₁ y₂

T y₃

R1 R2 R3

C_q =∃R₁.(FuPu ∃R₂.∃R₃.P)). Then, for any ABoxAthat does not containG⁰, we haveΠ_Q,A |=GiffT_Q,A |=∃x G⁰(x).

Yet, despite all of these efforts and results (implying, in view of the recent positive solution to the Feder-Vardi conjecture [6, 22], that there is a P/CONP dichotomy for OMQs with a SHIU ontology and a (Boolean) atomic query, which is decidable in NEXPTIME), we are still lacking simple and transparent, in particular syntactic, conditions guaranteeing this or that data complexity or type of rewritability. Some results in this direction were obtained in [10, 11]. That a transparent classification of monadic sirups according to their data complexity has not been found so far and the close connection to CSPs indicate that this problem is extremely hard in general. Our aim below is to demonstrate that, for the OMQs with covering considered in this paper, syntactic conditions do exist but are difficult to find.

(4)

4 0-CQs

As observed in [9], if an OMQQ(of the form defined above) is in AC⁰, thenqis an FO-rewriting ofQ. By a0-CQwe mean any CQ that does not contain a solitaryF. A twinin a CQqis any pairF(x), T(x)∈q. Here is an encouraging syntactic criterion of FO-rewritability for OMQs of the form(Cov^⊥_A,q):

Theorem 1. (i) Ifq is a0-CQ, then answering both(Cov^⊥_A,q)and (CovA,q)is in AC⁰, withqbeing an FO-rewriting of these OMQs.

(ii)Ifqis not a0-CQ and does not contain twins, then answering both(Cov^⊥_>,q) and(Cov_>,q)isL-hard.

Proof. (i)follows from [9, Theorem 1].(ii)The proof is by reduction to the reachabil- ity problem for undirected graphs, which is known to be L-complete; see, e.g., [3].

Denote by q⁰ the CQ obtained from q by gluing together all the variables x with F(x) ∈ qand also all the variablesy withT(y) ∈ q. Thus,q⁰ contains a singleF- atom,F(x), and a singleT-atom,T(y). Clearly, there is a homomorphismh:q→q⁰. Letq⁰⁰=q⁰\ {F(x), T(y)}.

Suppose we are given an undirected graphG= (V, E)and two verticess, t∈V. We regardGas a directed graph such that(u, v)∈Eiff(v, u)∈E, for anyu, v ∈V. Now, we encodeGby means of an ABoxAGthat is obtained fromGas follows. For every edgee= (u, v)∈E, letq⁰⁰_e be the set of atoms inq⁰⁰withxrenamed tou,yto vand all other variablesztoze. ThenAGcomprises all suchq⁰⁰_e, fore∈E, as well as F(s)andT(t). We show thats→GtiffCov_>,AG |=q.

Suppose thats→G t, i.e., there exists a paths=v0, . . . , vn = tinGwithei = (vi, vi+1) ∈ E, fori < n. Consider an arbitrary modelI of Cov> andAG. Since I |=Cov>, andF(s)andT(t)are inAG, we can find somei < nsuch thatI |=F(vi) andI |= T(v_i+1). Asq⁰⁰_e

i is an isomorphic copy ofq⁰⁰, we obtain I |= q⁰, and so I |=q. Conversely, supposes6→G t. Define an interpretationI, extending the ABox AG, by settingF^I to be the set of objects inAGthat are reachable fromsandT^I its complement. Clearly,I is a model ofCov_>. By the construction, the elements of the connected component ofIcontainingscannot be instances ofT, while the remaining elements ofIare not be instances ofF. Sinceqis connected, it follows thatI 6|=q.

Corollary 1. An OMQ(Cov^⊥_A,q)is inAC⁰iffqis a0-CQ, which can be decided in linear time.

If twins can occur in CQs (that is,F andT are not necessarily disjoint), the picture becomes more complex. On one hand, we have the following criterion for OMQs (Cov_A,q)with apath CQqwhose variablesx₀, . . . , x_n inqare ordered so that the binary atoms inqform a chainR₁(x₀, x₁), . . . , R_n(x_n−1, x_n).

Theorem 2 ([9]).An OMQ(CovA,q)with a path CQqis inAC⁰iffqis a0-CQ. Ifq contains both solitaryF andT, then(CovA,q)isNL-hard.

On the other hand, this AC⁰/NL criterion collapses for path CQs with loops:

Proposition 1. The OMQ(CovA,q), whereqis shown below, is inAC⁰.

(5)

F T R

T F F T

R R S S S

Proof. Follows from the sufficient condition of Theorem 4(i).

Note that the CQqabove isminimal(not equivalent to any of its proper sub-CQs).

Note also that, if a minimal 1-CQqcontains both a solitaryF and a solitaryT, then FO-rewritability of(CovA,q)implies thatqcontains at least one twin (F T) and at least oneywithT(y)∈/ qandF(y)∈/q(which can be shown using Theorem 4(i)).

5 1-CQs

1-CQs have exactly one solitaryF. As observed in Section 3, we have the following:

Theorem 3. (i)Answering any OMQ(CovA,q)with a1-CQqcan be done inP.

(ii)Answering any OMQ (CovA,q)with an F-tree q is either in AC⁰ or NL- complete orP-complete. The trichotomy can be decided inEXPTIME.

Theorem 3(ii)was proved by a reduction to the AC⁰/NL/P-trichotomy of [14].

It is to be noted, however, that applying the algorithm from [14] in our case is tricky because the input ontologyT_Qmust first be converted to a normal form. As a result, we do not obtain transparent syntactic criteria on the shape ofqthat would guarantee that the OMQ(Cov_A,q)belongs to the desired complexity class (see examples below).

We now give a semantic sufficient condition for an OMQ with a 1-CQ to lie in NL.

This condition uses ideas and constructions from [7, 14]. Let Q = (CovA,q)be an OMQ with a 1-CQqhaving a solitaryF(x). Define by induction a classKQof ABoxes that will be calledcactuses forQ. We start by settingKQ := {q}, regardingq as an ABox, and then recursively apply toKQthe following two rules:

(bud) ifT(y)∈ A ∈K_Qwith solitaryT(y), then we add toK_Qthe ABox obtained by replacingT(y)inAwith(q\F(x))∪ {A(x)}, in whichxis renamed toyand all of the other variables are givenfreshnames;

(prune) ifCovA,A⁰ |=Q, whereA⁰=A \ {T(y)}andT(y)is solitary, we add toKQ

the ABox obtained by removingT(y)fromA ∈KQ.

It is readily seen that, for any ABoxA⁰, we haveCov_A,A⁰ |=Qiff there existA ∈K_Q and a homomorphismh:A → A⁰. Denote byK^†_Q the set of minimal cactuses inK_Q (that have no proper sub-cactuses inK_Q).

For a cactusC ∈K_Q, we refer to the copies of (maximal subsets of)qthat comprise C assegments. TheskeletonC^sof C is the ditree whose nodes are the segmentssof C and edges(s,s⁰)mean thats⁰ was attached tosby budding. The atoms T(y) ∈ s are called thebudsofs. Therankr(s)ofsis defined by induction: ifsis a leaf, then r(s) = 0; for non-leafs, we compute the maximal rankmof its children and then set

r(s) =

(m+ 1, ifshas≥2children of rankm;

m, otherwise.

(6)

ThewidthofCandC^sis the rank of the root inC^s. We say thatK^†_Q is ofwidthkif it contains a cactus of widthkbut no cactus of greater width. ThedepthofCandC^sis the number of edges in the longest branch inC^s.

We illustrate the definition by an example. Denote byq_{T nT}, forn≥0, the 1-CQ shown below, where all the binary predicates areRand thenvariables without labels do not occur inF- orT-atoms:

F T T

n . . .

Example 1. LetQ= (Cov>,q_T_1T). In the picture below, we show a cactusCobtained by applying(bud)twice toq_T_1T (withA=>omitted):

F T

z

T

T T

One can check thatCov_>,C \ {T(z)} |=q_T1T, and so an application of(prune)will removeT(z)fromC. Using this observation, one can show thatK^†_Q is of width 1. On the other hand, ifQ= (CovA,q_T_1T)thenK^†_Qis of unbounded width as follows from Theorem 6 below.

Theorem 4. LetQ= (CovA,q)be an OMQ with a 1-CQq. Then

(i)Qis inAC⁰iff for everyC ∈K^†_Q, there is a homomorphismh:q→ C;

(ii)Qis rewritable in linear datalog, and so is inNL, ifK^†_Qis of bounded width.

Proof. The proof of(i)is straightforward. To prove(ii), we represent the ABoxes in KQas words in a tree alphabet and construct a non-deterministic finite tree automaton A_Q such thatK^†_Q ⊆ L(A_Q) ⊆ K_Q. Then we show that, forK^†_Q of finite width, the automatonA_Qcan be transformed into a monadic linear-stratified datalog rewriting of Q. It can further be converted into a linear datalog rewriting at the expense of increasing the arity if IDBs in the program [2].

It is worth noting that, forQ= (Cov>,q)withqfrom Proposition 1,K^†_Qconsists of qand the cactus of depth 1, in which the only solitaryTis removed by(prune). Clearly, there is a homomorphism fromqinto this cactus, and soQis FO-rewritable. However, for the 1-CQqin the picture below (where all edges are bidirectional),(Cov>,q)is not FO-rewritable, but there is a homomorphism fromq to both cactuses of depth 1. We do not know whether, in general, there is an upper boundN_qsuch that the existence of homomorphismsh:q→ C, for allC ∈K_Qof depthN_q, would ensure FO-rewritability of(CovA,q). For 1-CQsqwith a single solitaryT, one can takeNq =|q|+ 1. Neither do we know the exact complexity of deciding FO-rewritability of OMQs with 1-CQs.

(7)

As mentioned in Section 3, this problem is reducible to the boundedness problem for monadic datalog programs, which is known to be in 2EXPTIME.

T

F T F T

T

F T F T

F

S S S R Q R

Q

S R, Q R, Q

S S R Q R Q

R

S R, Q R, Q

S

Theorem 4(ii)allows us to obtain a sufficient condition for linear-datalog rewritability of OMQs(Cov_A,q)with anF-path CQq, that is, a path CQ with a single solitary F at its root. We represent such aqas shown in the picture below, which indicatesall the solitary occurrences ofFandT:

q =

F x

T y1

T yi

T

ym ym+1

. . . . . . . . . . . .

We require the following sub-CQs ofq:

– q_iis the suffix ofqthat starts atyi, but withoutT(yi), for1≤i≤m;

– q^∗_i is the prefix ofqthat ends atyi, but withoutF(x)andT(yi), for1≤i≤m;

– q^∗_m+1isqwithoutF(x),

and writefi:q_iqiffiis a homomorphism fromq_iintoqwithfi(yi) =x.

Theorem 5. If for each1≤i≤mthere existf_i: q_iq, then(Cov_A,q)is rewritable into a linear datalog program, and so isNL-complete.

ForF-path CQsqwithout twins, we extend Theorem 5 to a NL/P dichotomy (provided that NL 6= P). Given such a CQq, we denote byNq the set of the numbers indicating the length of the path fromxto each of theyi,i= 1, . . . , m+ 1.

Theorem 6. LetQ= (CovA,q)be an OMQ whereqis anF-path CQ without twins having a single binary relation. The following are equivalent unlessNL=P:

(i) QisNL-complete;

(ii) {0} ∪Nqis an arithmetic progression;

(iii) there existfi:q_iqfor everyi= 1, . . . , m.

If these conditions do not hold, thenQisP-complete.

Proof. It is readily seen, using Theorem 5, that (ii) ⇒ (iii) ⇒ (i). We prove the implication(i)⇒(ii). Suppose{0} ∪N_q is not an arithmetic progression. We show thatQis P-hard by reduction of the monotone circuit evaluation problem. LetXbe the sub-CQ ofqbetweenF(x)and the firstT but without theF andT atoms, and let|X| be the number of atoms inX. LetY be the first sub-CQ between neighbouringT-atoms without theseT-atoms such that|Y| 6=|X|,nthe number ofX-sub-CQs beforeY, and letZ be the rest of the query without the first T atom; see the picture below where

(8)

n= 2.

F x0

T x1

T x2

T x3 x4

X X Y Z

We distinguish between two cases:|Y|>|X|and|Y|<|X|. Depending on the case, we use the following gadgets for∧-gates, which are shown below forn= 2(it should be clear how to modify the construction forn >2); the∨-gate gadget is the same for both cases.

∧-gate for|Y|>|X| ^c A

A a⁰

T

a A T A b⁰

T

b A

X

YZ X

X

Z Y

X Y

Z ∧-gate for|X|>|Y|

A c

T

A b

A a

X

X Y Z

∨-gate c

T

a

A

T

b

A

X

Y Z

X

Y Z

Given any monotone Boolean circuitC and its inputα, we construct an ABoxAC,α

by replacing∧- and∨-gates with the inputsaandband outputcby the gadgets above, placingTon the input gates evaluated to 1 underαandFon the output gate. We claim thatCovA,AC,α|=qiffC(α) = 1.

(⇐) It is not hard to show by induction that whenever a non-input gate g inC outputs 1 underα, then the sub-ABox ofAC,α generated bygand withF placed on g validatesQ. For example, supposeT(a),T(b)andF(c)hold in the left-hand side gadget. Then eitherF(b⁰)orT(b⁰). In the former case,qis satisfied; so assumeT(b⁰).

We have eitherF(a⁰)orT(a⁰). Now, in either case,qis satisfied, as required.

(⇒) Suppose C(α) = 0. We construct a model I of CovA based onAC,α by placingT orF togandg⁰ (if available) depending on the value ofginCunderα. It can be checked thatI 6|=q. In particular, if|Y|>|X|,cis the conjunction ofbanda, c=b= 0anda= 1, then ifx0goes toc,x1goes toa⁰, thenx2must go to the point below a⁰, and sox3 must go somewhere between aand the point abovea, which is impossible. Ifa= 0, b= 1andx₀goes toa⁰andx₁goes to the centralT, thenx₂must go somewhere between the centralT andb⁰, which is also impossible. If|Y| <|X|, c = b∧a,b =d∨e,a =c = 0,b = 1, then ifx₀goes toc,x₂goes tob, thenx₃ cannot go toaasa= 0, and so it must go into someX-segment of the gateb=d∨e going out ofb, which is again impossible.

Note that the proof of P-hardness in Theorem 6 does not go through forA = >.

Thus, for(Cov_>,q_T1T), we are in the framework of Example 1 and, by Theorem 4(ii), this OMQ is in NL. In fact, we have the following NL/P dichotomy for the OMQs of the form(Cov_>,q_{T nT}).

Theorem 7. (i)The OMQ(Cov_>,q_T1T)isNL-complete(whereas(CovA,q_T_1T)is P-complete).

(9)

(ii)The OMQs(Cov_>,q_{T nT}) (and(Cov_A,q_{T nT})), forn≥2, areP-complete.

Proof. The proof is similar to that of P-hardness in Theorem 6.

We now apply Theorem 4(ii)to the class ofT F-path CQsof the form

q_{T F} =

T y0

F x

T y1

T

ym ym+1

. . . . . . . . . . . .

where the T(yi)and F(x)are all the solitary occurrences of T andF inq_{T F}. We represent this CQ as

q_{T F} ={T(y0)} ∪q₀∪q,

whereq₀ is the sub-CQ ofq_{T F} betweeny0andxwithT(y0)removed andq is the same as in Theorem 5 (andq^∗_m+1isqwithoutF(x)).

Theorem 8. If q satisfies the condition of Theorem 5 and there is a homomorphism h: q^∗_m+1→q₀such thath(x) =y0, then answering(CovA,q_{T F})isNL-complete.

For example, the OMQ(CovA,q)withqshown below is NL-complete:

T F T F T

On the other hand, as shown in [9],(Cov_A,q)withqof the form

T F T

is P-complete; in fact, it follows from the proof that(Cov_>,q)is P-complete, too.

6 2-CQs

A2-CQhas at least two solitaryF and at least two solitaryT. For example, as shown in [9], answering(CovA,q)with the following CQqisCONP-complete:

T T F F

The proof can be generalised to the class of 2-2-CQs, which are path 2-CQs where all the F are located after all theT, and every occurrence ofT or F is solitary. We represent any given 2-2-CQqas shown below

T x

T y

F z

F

p r s u w v

wherep,r,uandvdo not containFandT, whilesmay contain solitary occurrences of bothTandF(in other words, theTshown in the picture are the first two occurrences ofT inqand theFare the last two occurrences ofFinq). Denote byq_r the suffix of qthat starts fromxbut withoutT(x); similarly,q_uis the suffix ofqstarting fromzbut withoutF(z). Denote byq⁻_r the prefix ofqthat ends atybut withoutT(y); similarly, q⁻_u is the prefix ofqending atwbut withoutF(w). Using the construction from [9], one can show the following:

(10)

Theorem 9. Any OMQ (Cov_A,q) with a 2-2-CQq is CONP-complete provided the following conditions are satisfied:(i)there is no homomorphismh₁:q_u → q_r with h₁(z) =x, and(ii)there is no homomorphismh₂: q⁻_r →q⁻_u withh₂(y) =w.

Unfortunately, the proof does not generalise to the other two path 2-CQs with four variables, whose complexity has so far been open:

q₁ ^T ^F ^T ^F

q₂ ^T ^F ^F ^T

Theorem 10. The OMQsQ₁= (Cov_A,q₁)andQ₂= (Cov_A,q₂)areCONP-complete.

Proof. We proveCONP-hardness ofQ₁by reduction of the NP-complete 3SAT. Given a 3CNFψ, we construct an ABoxAψ as follows. First, for every variablepoccurring inψ, we take the followingp-gadget, wherenis the number of clauses inψ:

A A A

a1 A

. . . A

an A

F

T

A A

cn A

. . . A

c1 A A F T

T F

F

T F

T

A A

b1 A . . .

A

bn

A A

T

F

A A A

dn A . . . A

d1 A T F T F

T F

F

T F

T

p ¬p

The key property of thep-gadget is that, for any modelIofCov_Abased on this gadget, ifI 6|= q₁ then either theA-points on left-hand side of the gadget are all inT^I and theA-points on the right-hand side are all inF^I, or the other way round. We refer to theai andbi as p^↑-and¬p^↑-contacts, and to theci anddi asp^↓-and¬p^↓-contacts, respectively.

Now, for every clausec= (l1∨l2∨l3)inψ, we add to the constructed gadgets for the variables inψthe atomsR(u^c_¬l

1, v^c_l

2),R(v_l^c

2, c),T(c),R(c, w_l^c

3), wherecis a new individual,u^c_¬l

1 a fresh¬l^↑₁-contact,v_l^c

2 a freshl^↓₂-contact, andw_l^c

3 a freshl^↓₃-contact.

For example, for the clausec= (p∨q∨ ¬r), we obtain the fragment below:

A bi

A cj

T c

A dk

A

¬p

A A

q

A A

¬r

A

The resulting ABoxA_ψis such thatψis satisfiable iffCov_A,A_ψ6|=q₁.

For the OMQQ₂, we use (simplified)p-gadgets presented in Theorem 11 and con- nect them similarly to the picture above withT replaced byFatc.

We now sketch two generalisation of Theorem 10.

(11)

In Theorem 11 and 12, we assume thatp andv do not contain F andT, while randumay only contain solitary occurrences ofT (F 6∈ r,u), andsonly solitary occurrences ofF (T 6∈s).

Theorem 11. Any OMQ(Cov_A,q)withqof the form

T x

F y

F z

T

p r s u w v

isCONP-complete provided the following conditions are satisfied:(i)there is no homomorphism h₁: r_t → u with h₁(y) = w, and (ii) there is no homomorphism h₂: u_t → rwithh₂(z) = x, whereris the sub-CQ ofq betweenxandy without T(x),F(y), and similarly foru,r_tisrwithT(x)andu_tisuwithT(w).

The proof uses the followingp-gadget:

A A A

a1 A . . .

A

an A

F

T A A A

cn A . . . A

c1 A F

T F

T

F T

F

T F

T

r r r r r

u

s

u u u u u u s p

r

p

p r

v p v

p p p p

s u

v v v v v

v v v v

r s

p p p p p

A A

b1 A . . .

A

bn

A A

F T

A A

cn A . . . A

c1 A A F

T F T

F T

p p p p

F

T F

T

r r r r r

s

r

u u u u u u r

v u

v

p

p u

v

v p

s u

v v v v v

v v v v

r s

p p p p p

F F

r s

u

s

p ¬p

If one of the two conditions of this theorem is not satisfied then the given gadget will not work; see the shaded parts. For example, the complexity of the OMQ with the CQ below is still unknown:

T x

F y

F F

z

T T T

w

In Theorem 12, we user^ext = r(x, y)∧T(y)∧s1(y, y1)∧F(y1), wheres1is the part ofssuch thats(y, z) =s1(y, y1)∧F(y1)∧s2(y1, z)ands1(y, y1)does not contain any occurrences ofF. In other words, the variabley₁ corresponds to the first appearance ofFins, wheresis the sub-CQ ofqbetweenyandzwithoutF(y),T(z).

Theorem 12. Any OMQ(CovA,q)withqof the form

T x

F y

T z

F

p r s u w v

isCONP-complete provided the following conditions hold:(i)there is no homomor- phismg1:rt →uwithg1(y) =w, and(ii)there is no homomorphismg2:u→r^ext withg2(z) =xandg2(w) =y1.

The CQ below does not satisfy(ii), and its complexity is unknown:

T x

T F

y

F F T

z

T T F

w

It is also unknown whether there are OMQs with 2CQs in P.

(12)

Acknowledgements. The work of O. Gerasimova and M. Zakharyaschev was carried out at the National Research University Higher School of Economics and supported by the Russian Science Foundation under grant 17-11-01294.

References

1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995) 2. Afrati, F.N., Gergatsoulis, M., Toni, F.: Linearisability on datalog programs. Theor. Comput.

Sci. 308(1-3), 199–226 (2003), https://doi.org/10.1016/S0304-3975(02) 00730-2

3. Arora, S., Barak, B.: Computational Complexity: A Modern Approach. Cambridge Univer- sity Press, New York, NY, USA, 1st edn. (2009)

4. Benedikt, M., ten Cate, B., Colcombet, T., Vanden Boom, M.: The complexity of boundedness for guarded logics. In: 30th Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2015, Kyoto, Japan, July 6-10, 2015. pp. 293–304. IEEE Computer Society (2015),https://doi.org/10.1109/LICS.2015.36

5. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive datalog, CSP, and MMSNP. ACM Transactions on Database Systems 39(4), 33:1–44 (2014)

6. Bulatov, A.A.: A dichotomy theorem for nonuniform csps. In: Umans, C. (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, October 15-17, 2017. pp. 319–330. IEEE Computer Society (2017),https://doi.org/

10.1109/FOCS.2017.37

7. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization problems for database logic programs (preliminary report). In: STOC. pp. 477–490 (1988) 8. Feier, C., Kuusisto, A., Lutz, C.: Rewritability in monadic disjunctive datalog, mmsnp, and

expressive description logics (invited talk). In: Benedikt, M., Orsi, G. (eds.) 20th Inter- national Conference on Database Theory, ICDT 2017, March 21-24, 2017, Venice, Italy.

LIPIcs, vol. 68, pp. 1:1–1:17. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2017), https://doi.org/10.4230/LIPIcs.ICDT.2017.1

9. Gerasimova, O., Kikot, S., Podolskii, V., Zakharyaschev, M.: On the data complexity of ontology-mediated queries with a covering axiom. In: Proceedings of the 30th International Workshop on Description Logics (2017)

10. Hernich, A., Lutz, C., Ozaki, A., Wolter, F.: Schema.org as a description logic. In: Calvanese, D., Konev, B. (eds.) Proceedings of the 28th International Workshop on Description Logics, Athens,Greece, June 7-10, 2015. CEUR Workshop Proceedings, vol. 1350. CEUR-WS.org (2015),http://ceur-ws.org/Vol-1350/paper-24.pdf

11. Kaminski, M., Nenov, Y., Grau, B.C.: Datalog rewritability of disjunctive datalog programs and non-Horn ontologies. Artif. Intell. 236, 90–118 (2016),http://dx.doi.org/10.

1016/j.artint.2016.03.006

12. Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G., Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., Giese, M., Lie, H., Ioannidis, Y.E., Kotidis, Y., Koubarakis, M., Waaler, A.: Ontology based data access in statoil. J. Web Sem. 44, 3–36 (2017),https://doi.org/10.1016/j.websem.2017.05.005

13. Lanti, D., Rezk, M., Xiao, G., Calvanese, D.: The NPD benchmark: Reality check for OBDA systems. In: Alonso, G., Geerts, F., Popa, L., Barceló, P., Teubner, J., Ugarte, M., den Buss- che, J.V., Paredaens, J. (eds.) Proceedings of the 18th International Conference on Extending Database Technology, EDBT 2015, Brussels, Belgium, March 23-27, 2015. pp. 617–628.

OpenProceedings.org (2015),https://doi.org/10.5441/002/edbt.2015.62

(13)

14. Lutz, C., Sabellek, L.: Ontology-mediated querying with the description logic EL: trichotomy and linear datalog rewritability. In: Sierra, C. (ed.) Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017. pp. 1181–1187. ijcai.org (2017),https://doi.org/10.24963/

ijcai.2017/164

15. Marcinkowski, J.: DATALOG sirups uniform boundedness is undecidable. In: Proceedings, 11th Annual IEEE Symposium on Logic in Computer Science, New Brunswick, New Jersey, USA, July 27-30, 1996. pp. 13–24. IEEE Computer Society (1996),https://doi.org/

10.1109/LICS.1996.561299

16. Ramakrishnan, R., Sagiv, Y., Ullman, J.D., Vardi, M.Y.: Proof-tree transformation theorems and their applications. In: Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. pp. 172–181. ACM (1989)

17. Saraiya, Y.P.: Linearizing nonlinear recursions in polynomial time. In: Silberschatz, A. (ed.) Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 29-31, 1989, Philadelphia, Pennsylvania, USA. pp. 182–189.

ACM Press (1989),http://doi.acm.org/10.1145/73721.73740

18. Schaerf, A.: On the complexity of the instance checking problem in concept languages with existential quantification. J. of Intelligent Information Systems 2, 265–278 (1993)

19. Ullman, J.D., Gelder, A.V.: Parallel complexity of logical query programs. Algorithmica 3, 5–42 (1988),https://doi.org/10.1007/BF01762108

20. Vardi, M.Y.: Decidability and undecidability results for boundedness of linear recursive queries. In: Edmondson-Yurkanan, C., Yannakakis, M. (eds.) Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 21-23, 1988, Austin, Texas, USA. pp. 341–351. ACM (1988),http://doi.acm.org/

10.1145/308386.308470

21. Zhang, W., Yu, C.T., Troy, D.: Necessary and sufficient conditions to linearize double recursive programs in logic databases. ACM Trans. Database Syst. 15(3), 459–482 (1990), http://doi.acm.org/10.1145/88636.89237

22. Zhuk, D.: A proof of CSP dichotomy conjecture. In: Umans, C. (ed.) 58th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, CA, USA, Octo- ber 15-17, 2017. pp. 331–342. IEEE Computer Society (2017),https://doi.org/10.

1109/FOCS.2017.38