Polynomial Disjunctive Datalog Rewritings of Instance Queries in Expressive Description Logics

(1)

Polynomial Disjunctive Datalog Rewritings of Instance Queries in Expressive

Description Logics

Shqiponja Ahmetaj, Magdalena Ortiz, and Mantas ˇSimkus Institute of Information Systems, TU Vienna, Austria

Abstract. Rewriting ontology mediated queries (OMQs) into traditional query languages like FO-queries andDatalogis central for understanding their relative expressiveness, and plays a crucial role in the ongoing efforts to develop OMQ answering tools by reusing existing database technologies.

However, the vast majority of works focus on Horn ontologies, and for OMQs where the ontologies are written in extensions ofALC, only a couple of exponential rewritings into disjunctiveDatalogare known.

In this paper we propose a translation of instance queries mediated by ontologies in the expressive DLALCHI, into polynomial-sized disjunctive Datalog programs. The translation is based on a simple game-like algorithm, and can be extended to accommodate nominals. We can also rewrite OMQs withclosed-predicates intoDatalogprograms with (limited) negation. Closed predicates are useful for combining complete and incomplete information, but make OMQs non-monotonic and thus not rewritable into positive disjunctiveDatalog.

1 Introduction

In ontology-mediated queries (OMQs), a database query is enriched with an ontology, providing knowledge to obtain more complete answers from incomplete data. OMQs are receiving much attention in the database and knowledge repre- sentation research communities, particularly when the ontological knowledge is expressed in Description Logics (DLs) or in rule-based formalisms likeexistential rules andDatalog±, see e.g., [4, 3, 13] and their references. One of the most central questions in OMQ research isrewritability: given an OMQQ, specified by a query and an ontology, obtain a query Q⁰—in some standard database language, likefirst-order (FO) queries or (fragments of)Datalog—such that, for any datasetA, the certain answers toQandQ⁰ coincide. Such aQ⁰ is called arewriting ofQ. Its existence and size are crucial for understanding the relative expressiveness of different families of OMQs in terms of more traditional query languages. Rewritings are also very relevant in practice, since they allow to reuse existing database technologies to support OMQ answering.

Research into OMQs that can be rewritten intofirst-order (FO) queries has produced the successfulDL-Litefamily [5] of DLs. The succinctness of rewritings for OMQs consisting of the so-called (unions of ) conjunctive queries (UCQs)

(2)

paired with ontologies in DL-Lite, or in related families of existential rules, has been extensively studied. It has been shown that polynomially sized non-recursive Datalogqueries can be constructed for a few well known ontology languages, formulated as DLs or existential rules [14]; but for many other cases, the lack of succinctness of such rewritings has also been established [11], and some authors have considered rewritings that, unlike the ones we consider here, are notdata- independent [18, 12]. For DLs beyondDL-Lite, which are often not FO-rewritable, rewritings of UCQs into Datalogqueries have been proposed, and lie at the core of implemented systems, e.g., [24, 9, 25]. The pioneering work in [15] showed that instance queries in an expressive extension ofALC can be rewritten into a program in disjunctive Datalog, using a constant number of variables per rule, but exponentially many rules. The first translation from CQs in expressive DLs (SH,SHQ) to disjunctiveDatalogprograms was introduced in [8], but the program may contain double exponentially many predicates. For ALC and UCQs, the existence of exponential rewritings in disjunctiveDatalogwas shown recently [4]. For restricted fragments ofSHI and classes of CQs translations to Datalogwere investigated in [16, 17]. A polynomial timeDatalogtranslation of instance queries has been proposed in [23], but for a so-calledHorn-DLthat lacks disjunction. To our knowledge, this was until now the only polynomial rewriting for a DL that is not FO-rewritable.

The main contribution of this paper is a polynomial time translation of instance queries mediated by ALCHITBoxes intodisjunctive Datalog. The translation is based on a simple game-theoretic characterization, and implements a so-called type-elimination procedure using a polynomially sized Datalog program. To our knowledge, this is the first polynomial time translation of an expressive (non-Horn) DL into disjunctiveDatalog. The translation we present here can be extended to richer OMQs, and in particular, to nominals andclosed predicates. In fact, we report here a simplified version of the translation in [1], which is a polynomial translation of instance queries mediated by ALCHOI TBoxes with closed predicates into disjunctiveDatalogqueries with negation.

2 Preliminaries

We give some basic notions of DLs and Datalog.

The DLALCHI We assume countably infinite, mutually disjoint setsNR of role names,NC ofconcept names, andNI ofindividual names. Arole ris either a role namep, or an expressionp⁻, called theinverseofp. We letr⁻ =pifr=p⁻. Concepts are defined inductively:(i)>,⊥and allA∈N_C are concepts;(ii)if C₁, C₂ are concepts, thenC₁uC₂,C₁tC₂ and¬C1are concepts;(iii)ifris a role, andC is a concept, then∃r.C,∀r.C are concepts. Aconcept inclusion is an expression of formC₁vC₂, whereC₁, C₂ are concepts. Arole inclusion is an expression of formr₁vr₂, wherer₁, r₂ are roles. ATBox T is a finite set of (concept and role) inclusions. AnABox Ais a finite set ofassertions of the forms A(a) andp(a, b), where{a, b} ⊆NI,A∈NC, andp∈NR. Aknowledge base (KB) is a tupleK= (T,A), whereT is a TBox andAis an ABox.

(3)

Aninterpretationis a pairI=h∆Î,·Îi, where∆Î is a non-empty set (called the domain), AÎ ⊆∆Î for eachA∈N_C, rÎ ⊆∆Î×∆Î for each r∈N_R, and aÎ∈∆Î for eacha∈N_I. The function·Î is extended to the remaining concepts and roles in the standard way [2]. An interpretation I satisfies an inclusion q1vq2, if q₁Î⊆qÎ₂, in symbolsI |=q1vq2; and it satisfies an assertionq(a) if (a)Î ∈qÎ , in symbols, I |=q(a). For Γ a TBox or ABox, we writeI |=Γ if I |=αfor all α∈ Γ. For a KB K = (T,A), we write I |= K if the following hold:(i) a∈∆Î andaÎ =a for eacha∈NI occurring in K,¹ (ii)I |=T, and (iii)I |=A. For an assertionα, we write K |=αifI |=αfor allI withI |=K.

Instance Queries An(ontology mediated) instance queryis a tupleQ= (T, q), whereT is a TBox, and q∈NC∪NR. Leta ∈NI in case q∈NC, and a∈N²_I otherwise. Thenais acertain answer toQover an ABoxAif (T,A)|=q(a). To ease presentation, we assume that every concept and role ofAalso occurs in T. Normal Form Our results apply to arbitrary TBoxes, but to simplify presentation, we consider TBoxes innormal form where inclusions take one of the following forms:

(N1) A₁u · · · uA_nvA_n+1t · · · tA_k (N3) Av ∀r.A⁰

(N2) Av ∃r.A⁰ (N4) rvs

withr, sroles,{A1, . . . , Ak} ⊆NC, and{A, A⁰} ⊆NC. We also assume thatT is closed under role inclusions as follows: (a)pv p∈ T, for eachp∈NR occurring inT, (b) ifrvs∈ T, thenr⁻vs⁻∈ T, and (c) ifr1vr2∈ T andr2vr3∈ T, then r1vr3 ∈ T. For Γ a TBox, ABox, or KB, we denote by NI(Γ), NR(Γ), N_C(Γ) the set of individuals, role and concept names that occur inΓ, resp.

Disjunctive Datalog We assume countably infinite setsNPandNVofpredicate symbols (each with an associatedarity) and variables, respectively. We further assume thatN_C∪N_R ⊆N_Pwith eachA∈N_Cbeing unary, and eachr∈N_Rbeing binary. Anatom is an expression of the formR(t₁, . . . , t_n), where{t1, . . . , t_n} ⊆ N_I∪N_V, andR is an n-ary relation symbol. A rule ρ is an expression of the form h₁∨. . .∨h_n ←b₁, . . . , b_k, whereH ={h₁, . . . , h_n} andB ={b₁, . . . , b_k} are sets of atoms, called thehead and thebody ofρ, respectively. Each variable that occurs inH must also occur inB. Rules of the formh←(known asfacts) are simply identified with the atom h, thus ABox assertions are valid facts in our syntax. For a role name p, we may use p⁻(t1, t2) to denote the atom p(t2, t1). Aprogram is any finite setP of rules. We useground(P) to denote the groundingofP, i.e. the variable-free program that is obtained fromPby applying on its rules all the possible substitutions of variables by individuals ofP. An (Herbrand) interpretation (or,database)I is any finite set of variable-free (or, ground) atoms. An interpretationIis amodel of a programP if{b1, . . . , bk} ⊆I impliesI∩ {h1, . . . , hn} 6=∅for all rulesh1∨. . .∨hn←b1, . . . , bk inground(P).

A(Datalog) query is a pair (P, q), whereP is a program, andq is a predicate

1 This is called thestandard name assumption (SNA), which we apply to the individuals occurring inK.

(4)

symbol from P. A tuple a of constants is a certain answer to (P, q) over a database I ifq(a)∈J for all modelsJ ofP∪I.

3 Characterization of Counter Models

Assume a KBK= (T,A) and an assertionβ. Towards deciding K 6|=β using a polynomially sized program, we decompose the problem into two steps:

(1) Guess acore interpretation Ic forK, whose domain isNI(A). Core interpretations fix how the individuals ofKparticipate in concepts and roles, ensuring the satisfaction ofA, and the non-entailment ofβ.

(2) Check thatIc can be extended to satisfy all axioms inT.

Defining Datalogrules that do (1) is not hard, but (2) is more challenging, and will rely on a game-theoretic characterization we describe below. But first we need to define core interpretations.

Definition 1. A core interpretationfor a KBK= (T,A)is any interpretation Ic such that

(c1) ∆^I^c=NI(A)anda^I^c=afor all a∈NI(A), (c2) I_c |=A,

(c3) Ic |=Av ∀r.A⁰ for allAv ∀r.A⁰∈ T, (c4) Ic |=rvsfor allrvs∈ T,

(c5) q^I^c =∅for each q6∈NC(T)∪NR(T).

An interpretationJ is called an extension ofIc, ifIc is the result of restricting J toNI(A).

A core and its extensions coincide on the assertions they entail, and deciding non-entailment of an instance query qamounts to deciding whether there is a core that does not entailq, and that can be extended into a model. But verifying whether a core can be extended into a full model is hard: it corresponds to testing consistency (of Ic viewed as an ABox) with respect to T, an ExpTime-hard problem already for fragments ofALCHI. In order to obtain a polynomial set of rules that solves this ExpTime-hard problem, we characterize it as a game, revealing a simple algorithm that admits an elegant implementation inDatalog. Definition 2. A type (over a TBoxT)τ is a subset of NC(T)∪ {>}such that

⊥ 6∈τ and> ∈τ. We say thatτ satisfiesan inclusionα=A1u · · · uAnvAn+1t

· · · tAk of type (N1), if {A1, . . . , An} ⊆ τ implies {An+1, . . . , Ak} ∩τ 6= ∅;

otherwise τ violates α. For an element e∈∆Î in an interpretation I, we let type(e,I) ={B∈NC|e∈BÎ}. A typeτ is realizedinI if there is somee∈∆Î s.t. type(e,I) =τ.

We now describe a game to decide whether a given coreIc can be extended into a model of a KB K. The game is played by Bob (the builder), who wants to extendIc into a model, and Sam (the spoiler), who wants to spoil all Bob’s

(5)

attempts. Sam starts by picking an individual a, and they look at its type type(a,I_c). If it doesn’t satisfy the inclusions of type (N1) Sam wins. Otherwise, in each turn Sam chooses an inclusion of the formAv ∃rA⁰ which would need to be satisfied by (an element with) the current type, forcing Bob to pick a type for the correspondingr-successor that satisfiesA⁰. The game continues for as long as Bob can respond to the challenges of Sam.

The game on I starts by Sam choosing an individual a ∈ ∆^I, and τ = type(a,I) is set to be thecurrent type. Then:

() Ifτ does not satisfy all inclusions of type (N1) inT, then Sam is declared the winner. Otherwise, Sam chooses an inclusionAv ∃r.A⁰ ∈ T withA∈τ;

if there is no such inclusion, Bob is declared the winner. Otherwise, Bob chooses a new typeτ⁰ such that:

(C1) A⁰ ∈τ⁰, and

(C2) for all inclusionsA1v ∀s.A2∈ T:

• ifrvs∈ T andA1∈τ thenA2∈τ⁰,

• ifr⁻vs∈ T andA₁∈τ⁰ thenA₂∈τ.

τ⁰ is set to be the current type, and the game continues with a new round, i.e. we go back to.

Arun of the game onI is a (possibly infinite) sequence aα1τ1α2τ2. . .

wherea is the individual picked initially by Sam, and each αi andτi are the inclusion picked by Sam and the type picked by Bob in round i, respectively. A strategy for Bob is a partial functionstrthat maps pairs of a typeτand an inclu- sionAv ∃r.A⁰ with A∈τ to a typeτ⁰ that satisfies (C1) and (C2); intuitively, it gives a move for Bob in response to moves of Sam. A runaα1τ1α2τ2. . . with type(a,I) =τ0 follows a strategy strifτi=str(τi−1, αi) for everyi≥1.

For a finite runw, we lettail(w) =type(a,I) if w=a, and tail(w) =τ` if w=a . . . α`τ` with`≥1. The strategy stris called non-losing onI if for every finite run w that followsstr, tail(w) satisfies all inclusions of type (N1) in T, andstr(tail(w), Av ∃r.A⁰) is defined for everyAv ∃r.A⁰∈ T withA∈tail(w).

Theorem 1. Assume a KB K and an assertionβ. ThenK 6|=β iff there is a core interpretationIc forK such that:

(1) Ic6|=β, and

(2) there is a non-losing strategy for Bob on Ic.

Proof. (Sketch.) We focus on showing that there is a non-losing strategystrfor Bob on I_c iff there exists an extension J of I_c s.t. J |=K. The claim follows from this, and the easy claim that extensions preserve non-entailment ofβ.

For the “⇒” direction, from an arbitrary non-losingstrforI_c, we buildJ as follows. We denote byfrn(Ic,str) the set of all finite runsaα1τ1. . . αiτi,i≥0 that followstr. The domain ofJ is:

∆^J =frn(Ic,str)

(6)

and for eacha∈N_I, eachA∈N_C, and eachp∈N_R we let:

a^J =a^I^c A^J ={w|A∈tail(w)}

p^J =p^I^c∪ {(w, wα_iτ_i)|r_ivp∈ T } ∪ {(wαiτi, w)|r_i⁻vp∈ T }

whereαi =Av ∃ri.A⁰∈ T and{w, wαiτi} ⊆∆^J.

It is not hard to see thatJ is an extension ofIc. We show thatJ is a model of K. First,J |=Aby definition of Ic and the fact that J is an extension of Ic. It is left to prove that J satisfies all inclusions in T. Note that for each w∈∆^J,type(w,J) =tail(w). ThatJ satisfies all inclusions of type (N1) holds by definition of non-losingstrand the fact thattail(w) satisfies all such inclusions.

For the inclusions α=Av ∃r.A⁰ of type (N2), we see that for eachw∈∆^J, ifw∈A^J, thenA∈tail(w) andwis a run that followsstr. Hence, sincestr is a non-losing strategy str(tail(w), α) is defined, that is there exists aτ⁰ overT such thatstr(tail(w), α) =τ⁰ andA⁰ ∈τ⁰. Thuswατ⁰ is a run that followsstr, i.e.wατ⁰∈frn(I_c,str), sowατ⁰ is inA^0J and (w, wατ⁰)∈r^J ifris a role name, and (wατ⁰, w)∈r^J otherwise. For the inclusionsA₁v ∀r.A₂ of type (N3), we distinguish the case of (a, a⁰)∈rÎ^c for a paira, a⁰ of individuals in∆Î^c, for which a∈A^J₁ impliesa⁰ ∈A^J₂ holds by definition ofIc, and the case of an arbitrary pair of objects (w, w⁰)∈rÎ^c wherew⁰ is a child ofw(that is,w⁰ is of the form wαiτi), or vice-versa, for which w⁰ ∈A^J₂ wheneverw∈A^J₁ is ensured by the condition (C2) in the definition of the game, the fact that str is a non-losing strategy, and the construction ofJ. Finally, the inclusions of type (N4) are also guaranteed by the definition of the coreIc for pairs of individual names, and by the condition (C2) in the definition of the game for arbitrary pairs of an object and its child.

For “⇐”, assume an arbitrary modelJ ofKthat is an extension ofIc. We can extract from it a non-losing strategy for Bob as follows. First, letT be a the set of all the types realized inJ, and observe that each type inT satisfies all inclusions of type (N1) and that for alla∈NI(A),type(a,J)∈T. Then it suffices to set, for eachτ∈T and eachAv ∃r.A⁰ ∈ T withA∈τ,str(τ, Av ∃r.A⁰) =τ⁰ for an arbitrarily chosenτ⁰∈T that satisfies (C1) and (C2), which exists becauseJ is a model, hence it satisfies all existential and universal inclusions inT, and all types realized inJ are contained inT. Thisstr is a non-losing strategy for Bob. ut To decide whether Bob has a non-losing strategy on a given core we use the type elimination procedure Markin Algorithm 1, whichmarks (or,eliminates) all types from which Sam has a winning strategy. It takes as input the TBoxT, and an interpretationIwhich intuitively is the core being checked. The algorithm starts by building the setN of all possible types overT, and then it eliminates bad choices (from which Bob would loose) by marking them. In step (M_N1), the algorithm marks inN all types that violate some inclusion of type (N1) inT; Sam wins already in the first round on these types. Then, in the loop, (M_∃) exhaustively marks typesτ that allow Sam to pick an inclusionAv ∃r.A⁰ for

(7)

Algorithm 1: Mark

input : TBoxT, interpretationI output: Set of (possibly marked) types N← {τ |τ is a type overT }

(MN1) Mark eachτ ∈N that violates some inclusion of the form (N1) inT repeat

(M∃) Mark eachτ ∈N such thatAv ∃r.A⁰∈ T,A∈τ, andfor each τ⁰∈N, at least one the following holds:

(C0) τ⁰is marked, (C1⁰) A⁰∈/τ⁰, or

(C2⁰) there existsA1v ∀s.A2∈ T with – rvs∈ T andA1∈τ andA2∈/τ⁰, or – r⁻vs∈ T andA1∈τ⁰andA2∈/τ

untilno new type is marked returnN

which Bob cannot reply with anyτ⁰. At the end the marked types, are those from which Bob can lose, so we get:

Theorem 2. LetIc be a core interpretation. Then Bob has a non-losing strategy on Ic iff none of the types realized in Ic is marked by Mark(T,Ic).

Proof. (Sketch.) For the “⇒” direction, we can show (by induction in the number of iterations ofMark(T,Ic)) that if a type is marked, then it cannot occur in a non-losingstr for Bob. For the “⇐” direction, a non-losing strforIc is obtained by taking all unmarkedτ ∈N, and for each of them, and eachAv ∃r.A⁰∈ T withA ∈ τ, setting str(τ, Av ∃r.A⁰) = τ⁰ for an arbitrary unmarked τ⁰ that

satisfies (C1) and (C2). ut

4 Translation into Datalog

Assume an instance query (T, q). We build next a polynomially sized program P such that the queries (T, q) and (P, q) have the same certain answers for all ABoxes over the signature ofT. Roughly, the programP has 3 major components:

(a) rules to non-deterministically generate a core interpretationIc for the KB (T,A), whereAis an input ABox; (b) rules that implement the type elimination algorithm presented in the previous section; (c) rules that glue (a) and (b) together, ensuring that all types that occur inIcare not marked by the marking procedure.

We remark that the construction of P is independent from any particular ABox.

(I) Collecting the individuals We first add rules to collect in the unary predicateindall the individuals that occur in the input ABox. For eachA∈NC(T), r∈NR(T), we have:

ind(x)←A(x) ind(x)←r(x, y) ind(y)←r(x, y)

(8)

(II) Generating core interpretations For each A ∈ N_C(T) (resp., r ∈ N_R(T)) we will use a fresh concept name A (resp., role name r). We add the following rules toP:

A(x)∨A(x) ← ind(x) A∈N_C(T)

← A(x),A(x) A∈NC(T) r(x, y)∨r(x, y) ← ind(x),ind(y) r∈N_R(T)

← r(x, y),r(x, y) r∈NR(T)

To ensure (c3) in Definition 1, for each A v ∀r.A⁰ ∈ T we add the rule A⁰(y) ← A(x), r(x, y). Then, to ensure (c4), for each role inclusionrvs∈ T we add the rules(x, y) ← r(x, y).

Intuitively, the stable models of the above rules generate the different core interpretationsIc of the KBK= (T,A) for any givenA.

We next implement the algorithmMarkfrom Section 3. To obtain a polynomially sized program, we need to use non-ground rules whose number of variables depends on the number of different concept names inT. Assume an arbitrary enumerationA₁, . . . , A_k of N_C(T). Assume also a pair 0,1 of special individuals. Intuitively, we will use ak-ary relationType={0,1}^k to store the set of all types overT. Naturally, ak-tuple (a1, . . . , ak)∈Typeencodes the type τ = {Ai | ai = 1,1≤ i ≤ k} ∪ {>}. We are most interested in computing a k-ary relationMarked⊆ {0,1}^k that contains precisely the types marked by the Mark algorithm. We next define the rules to computeTypeandMarked, and other relevant relations.

(III) A linear order over types The first ingredient is a linear order over all possible types. To this end, for every 1≤i≤k, we inductively define i-ary relationsfirstⁱ andlastⁱ, and a 2i-ary relationnextⁱ, which will provide the first, the last and the successor elements from a linear order on{0,1}ⁱ. In particular, givenu,v∈ {0,1}ⁱ, the factnextⁱ(u,v) will be true ifvfollowsuin the ordering of{0,1}ⁱ. The rules to populate nextⁱ are quite standard (see, e.g., Theorem 4.5 in [6]). For the casei= 1, we simply add the following facts:

first¹(0)← last¹(1)← next¹(0,1)← Then, for all 1≤i≤k−1 we add the following rules:

nextⁱ⁺¹(0,x,0,y) ← nextⁱ(x,y) nextⁱ⁺¹(1,x,1,y) ← nextⁱ(x,y) nextⁱ⁺¹(0,x,1,y) ← lastⁱ(x),firstⁱ(y)

firstⁱ⁺¹(0,x) ← firstⁱ(x) lastⁱ⁺¹(1,x) ← lastⁱ(x)

We can now collect in thek-ary relationTypeall types overT (thus computing the setN of theMarkalgorithm):

Type(x)←first^k(x) Type(y)←next^k(x,y)

(9)

(IV) Implementing Step (M_N1)First, we add the auxiliary factsF(0)← and T(1)← toP. For ak-tuple of variables x, we letB∈xdenote the atomT(x_j), wherej is the index ofB in the enumeration ofN_C(T). Similarly, we let B6∈x denote the atomF(xj), wherej is the index ofB in the enumeration. Then the step (MN1), which marks types violating inclusions of type (N1), is implemented using the following rule for every inclusionA1u · · · uAnvA⁰₁t · · · tA⁰_mofT:

Marked(x)←Type(x), A₁∈x, . . . , A_n∈x, A⁰₁6∈x, . . . , A⁰_m6∈x

(V) Implementing Step (M∃) The following rules are added for each inclu- sionα=Av ∃r.A⁰∈ T. Recall that we need to mark a typeτ ifA∈τ, andfor each type τ⁰ at least one of (C0), (C1⁰) or (C2⁰) holds. We first use an auxiliary (2k+ 1) relationMarkedOneto collect all such typesτ⁰.

Assume a fresh individuala_α. For collecting eachτ⁰ that satisfies (C0), we add:

MarkedOne(x, aα,y)←Type(x),Marked(y) For the condition (C1⁰), we add the rule:

MarkedOne(x, a_α,y)←Type(x),Type(y), A⁰ 6∈y

The rules for (C2⁰) are as follows. For all inclusions A₁v ∀r.A2∈ T with rvs∈ T, we add the rule

MarkedOne(x, aα,y)←Type(x),Type(y), A1∈x, A26∈y

For allA1v ∀r.A2∈ T withr⁻vs∈ T, we also add

MarkedOne(x, aα,y)←Type(x),Type(y), A1∈y, A26∈x

We infer Marked(t) in case MarkedOne(t, a_α,v) is true for all types (bit vectors)v. To this end, we need another (2k+ 1)-ary relationMarkedUntil. We add:

MarkedUntil(x, a_α,z)←MarkedOne(x, a_α,z),first^k(z) MarkedUntil(x, aα,u)←MarkedUntil(x, aα,z),next^k(z,u),

MarkedOne(x, a_α,u)

Intuitively, with the above rules we traverse all types checking the conditions (C0), (C1⁰), (C2⁰) described in (M∃). If we manage to reach the last type, we

know the condition is satisfied. We add the following rule:

Marked(x)←MarkedUntil(x, aα,z), A∈x,last^k(z)

(VI) Forbidding marked types in the core We need to forbid each individual in the generated core interpretation from having a type fromMarked. For all 0≤i≤k, we take a fresh (i+ 1)-ary relation symbolProjⁱ. We first add:

Proj^k(x,y)←ind(x),Marked(y)

(10)

We will now project away bits from theProjⁱ relations by looking at the actual types of individuals. For all 1≤i≤kwe have the following rules:

Projⁱ⁻¹(x,y) ← Projⁱ(x,y,1), A_i(x) Projⁱ⁻¹(x,y) ← Projⁱ(x,y,0),A_i(x)

Intuitively,Projⁱ⁻¹(a, b₁, . . . , b_i−1) says the partial type given by the bit values b₁, . . . , b_i−1 can be extended to a marked type by choosing additional concepts according to the actual type of the individuala. Thus the factProj⁰(a) represent the situation whereahas a marked type. Such situations are ruled out by adding the constraint:

←Proj⁰(x)

This concludes the translation of the instance query (T, q), whereT is an ALCHI TBox, into the disjunctive Datalog program P. This construction, together with Theorems 1 and 2, yields the main result of this paper.

Theorem 3. For an instance query(T, q), we can build in polynomial time a Datalogquery(P, q)such that the certain answers to(T, q)and(P, q)coincide for any given ABoxAover the signature of T.

The above encoding employs disjunctive programs. Entailment of ground atoms in such programs is coNExpTime-complete [7], which does not match the ExpTime-completeness of satisfiability ofALCHI KBs. However, we employ disjunction in a limited way, and thus our programs fall into a class of programs that can be evaluated in (deterministic) exponential time. In particular, the above program P can be partitioned into two programs P₁, P₂ as follows: P₁ consists of all rules in (I) and (II), andP₂consists of the remaining rules. Observe thatP₁ is a disjunctive program with at most two variables in each rule, while P₂ is is disjunction-free. Note also thatP₂ does not define any relations used inP₁, i.e. none of the relation symbols ofP₁ occurs in the head of a rule inP₂. Assume a setF of facts over the signature ofP1. Due to the above properties, the successful runs of the following non-deterministic procedure generate the set of all minimal models ofP∪F:

(S1) Compute a minimal model I1 of P1∪F. If I1 does not exist due to a constraint violation, then returnfailure.

(S2) Compute the least modelI2 ofP2∪I1. Again, ifI2does not exist due to a constraint violation, then returnfailure. Otherwise, outputI2.

Since P1 has at most two-variables in every rule, each minimal model I1 of P1∪F is of polynomial size in the size ofP1∪F, and the set of all such models can be traversed in polynomial space. For a givenI1, performing the step (S2) is feasible in (deterministic) exponential time, because P2 is disjunction-free and of polynomial size in the size of the original instance query. It follows that computing the certain answers to (P, q) for any given ABoxAover the signature ofT requires single exponential time.

(11)

5 Discussion

We have presented our results for ALCHI, but they also apply to SHI, its extension with transitivity axioms; it is well known that instance queries mediated bySHITBoxes can be rewritten in polynomial time to instance queries mediated by ALCHI TBoxes (see, e.g., [15]). Moreover, we have presented the translation for instance queries, but the results can be easily generalized, e.g., to DL-safe rules of [22], or thequantifier-free conjunctive queries like in [21]. These queries are syntactically restricted to ensure that the relevant variable assignments only map into individuals of the input ABox. We note that generalizing our translation to arbitrary conjunctive queries while remaining polynomial is not possible under common assumptions in complexity theory. Indeed, cautious inference from a disjunctive program is in coNExpTime, but entailment of (Boolean) conjunctive queries is 2ExpTime-hard already for the DLALCI [19].

As mentioned in the introduction, both the game theoretic characterization and theDatalogtranslation can be extended to ontologies inALCHOI, the extension of ALCHI with nominals. This is possible even in the presence of the so-called closed predicates, which are useful for combining complete and incomplete knowledge, by explicitly specifying predicates assumed complete, thus given aclosed-world semantics [10, 20]. For example, take the following TBoxT:

BScStudvStudent Studentv ∃attends.Course BScStudv ∀attends.¬GradCourse

and the ABoxA:

Course(c₁), Course(c₂), GradCourse(c₂), BScStud(a)

Then (a, c1) is not a certain answer to (T, q) withq=attends(x, y). But ifc1and c2are known to be theonly courses, then (a, c1) should be a certain answer. This can be achieved by declaringCoursea closed predicate: (a, c1) is indeed a certain answer to theOMQ with closed predicates (T, Σ, q), where Σ={Course}.

Interestingly, OMQs with closed predicates arenon-monotonic. Indeed, (a, c1) is a certain answer to (T, Σ, q) over A, but it is not a certain answer over the extended set of factsA⁰ =A ∪ {Course(c3)}. For this reason, these queries cannot be rewritten into monotonic variants of Datalog, like positiveDatalog with disjunction as considered here. However, one can devise a polynomial time translation into disjunctive Datalog with negation as failure, for OMQs of the form (T, Σ, q) whereT is anALCHOI TBox,Σis a set of closed predicates, and qis an instance query. The translation uses the same ideas presented here, but handling closed predicates (both in the game-theoretic characterization, and in the Datalogtranslation) implies reasoning about the types that are already realized in the core; inDatalog, this requires negation. Details can be found in [1].

Acknowledgements

This work was supported by the Austrian Science Fund (FWF): projects P25207- N23, T515 and W1255-N23.

(12)

References

1. S. Ahmetaj, M. Ortiz, and M. ˇSimkus. Polynomial datalog rewritings for expressive description logics with closed predicates. InProc. of IJCAI’16, 2016. To appear.

2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, edi- tors. The Description Logic Handbook: Theory, Implementation, and Applications.

Cambridge University Press, second edition, 2007.

3. M. Bienvenu and M. Ortiz. Ontology-mediated query answering with data-tractable description logics. InProc. of RW 2015, pages 218–307. Springer, 2015.

4. M. Bienvenu, B. ten Cate, C. Lutz, and F. Wolter. Ontology-based data access:

A study through disjunctive Datalog, CSP, and MMSNP. ACM Trans. Database Syst., 39(4):33:1–33:44, 2014.

5. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: TheDL-Lite family.

J. Autom. Reasoning, 39(3):385–429, 2007.

6. E. Dantsin, T. Eiter, G. Gottlob, and A. Voronkov. Complexity and expressive power of logic programming. ACM Comput. Surv., 33(3):374–425, 2001.

7. T. Eiter, G. Gottlob, and H. Mannila. Disjunctive datalog. ACM Trans. Database Syst., 22(3):364–418, 1997.

8. T. Eiter, M. Ortiz, and M. ˇSimkus. Conjunctive query answering in the description logic SH using knots. J. Comput. Syst. Sci., 78(1):47–85, 2012.

9. T. Eiter, M. Ortiz, M. ˇSimkus, T. Tran, and G. Xiao. Query rewriting for Horn-SHIQ plus rules. InProc. of AAAI 2012. AAAI Press, 2012.

10. E. Franconi, Y. A. Ib´a˜nez-Garc´ıa, and I. Seylan. Query answering with DBoxes is hard. Electr. Notes Theor. Comput. Sci., 278:71–84, 2011.

11. G. Gottlob, S. Kikot, R. Kontchakov, V. V. Podolskii, T. Schwentick, and M. Za- kharyaschev. The price of query rewriting in ontology-based data access. Artif.

Intell., 213:42–59, 2014.

12. G. Gottlob, M. Manna, and A. Pieris. Polynomial combined rewritings for existential rules. InProc. of KR 2014. AAAI Press, 2014.

13. G. Gottlob, M. Manna, and A. Pieris. Polynomial rewritings for linear existential rules. InProc. of IJCAI 2015. AAAI Press, 2015.

14. G. Gottlob and T. Schwentick. Rewriting ontological queries into small nonrecursive datalog programs. InProc. of KR 2012. AAAI Press, 2012.

15. U. Hustadt, B. Motik, and U. Sattler. Reasoning in description logics by a reduction to disjunctive datalog. J. Autom. Reasoning, 39(3):351–384, 2007.

16. M. Kaminski, Y. Nenov, and B. C. Grau. Computing datalog rewritings for disjunctive datalog programs and description logic ontologies. InProc. of RR 2014, pages 76–91, 2014.

17. M. Kaminski, Y. Nenov, and B. C. Grau. Datalog rewritability of disjunctive datalog programs and its applications to ontology reasoning. InProc. of AAAI 2014, pages 1077–1083, 2014.

18. R. Kontchakov, C. Lutz, D. Toman, F. Wolter, and M. Zakharyaschev. The combined approach to ontology-based data access. InProc. of IJCAI 2011. IJCAI/AAAI, 2011.

19. C. Lutz. Inverse roles make conjunctive queries hard. InProc. of DL 2007. CEUR- WS.org, 2007.

20. C. Lutz, I. Seylan, and F. Wolter. Ontology-based data access with closed predicates is inherently intractable(sometimes). InProc. of IJCAI 2013. IJCAI/AAAI, 2013.

(13)

21. C. Lutz, I. Seylan, and F. Wolter. Ontology-mediated queries with closed predicates.

InProc. of IJCAI 2015. IJCAI/AAAI, 2015.

22. B. Motik, U. Sattler, and R. Studer. Query answering for OWL-DL with rules. J.

Web Sem., 3(1):41–60, 2005.

23. M. Ortiz, S. Rudolph, and M. ˇSimkus. Worst-case optimal reasoning for the Horn-DL fragments of OWL 1 and 2. InProc. of KR 2010. AAAI Press, 2010.

24. H. P´erez-Urbina, B. Motik, and I. Horrocks. Tractable query answering and rewriting under description logic constraints. J. Applied Logic, 8(2):186–209, 2010.

25. D. Trivela, G. Stoilos, A. Chortaras, and G. B. Stamou. Optimising resolution-based rewriting algorithms for OWL ontologies. J. Web Sem., 33:30–49, 2015.