Sufficient Conditions for First-Order and Datalog Rewritability in ELU

(1)

Sufficient Conditions for First-Order and Datalog Rewritability in ELU

Mark Kaminski and Bernardo Cuenca Grau Department of Computer Science

University of Oxford, UK.

Abstract. We study the problem of answering instance queries over non-Horn ontologies by rewriting them into Datalog programs or First- Order queries (FOQs). We consider the basic non-Horn languageELU, which extendsELwith disjunctions. Both Datalog-rewritability and FO- rewritability of instance queries have been recently shown to be decidable forALContologies (and hence also forELU); however, existing decision methods are mainly of theoretical interest. We identify two fragments of ELUfor which we can compute Datalog-rewritings and FO-rewritings of instance queries, respectively, by means of a resolution-based algorithm.

1 Introduction

We study the problem of answering queries over DL ontologies by rewriting them into a First-Order query (FOQ) or a Datalog program. On the theoretical side, rewriting queries into an FOQ or a Datalog program ensures tractability of query answering in terms of data complexity. On the practical side, it allows one to reuse optimised data management systems, namely RDBMSs in the case of FOQs and rule-based systems such as OWLim or Oracle’s Semantic Data Store in the case of Datalog.

The problem of ontology-based query answering via query rewriting has so far been mainly studied for Horn DLs. FO-rewritability is ensured for logics of the DL-Lite family [5] and query rewriting algorithms in these DLs have been implemented in systems such as QuOnto [1], Presto [17], Quest [16], Rapid [6], and Owlgres [20]. Datalog rewritability is ensured for logics of theELfamily, as well as more expressive languages such as Horn-SHIQ[11]; optimised algorithms have been implemented in systems such as Requiem [15] and Clipper [10]. FO- rewritability has also been studied as a decision problem for logics of the EL- family [3], where tight complexity bounds have been shown.

Horn DLs, however, cannot capture disjunctive knowledge, such as ‘every student is either a graduate or an undergraduate’. As a consequence, ontologies capturing disjunctive knowledge cannot be processed using existing rewriting techniques. Little is known about how to compute FO and Datalog rewritings for non-Horn ontologies, and first results have been obtained only recently. Instance queries (i.e., queries of the formA(x) withAa concept name) are known to be FO-rewritable w.r.t. ontologies expressed in certain logics of the DL-Liteboolfam- ily —the extension of DL-Lite logics with disjunction [2]—, and a goal-oriented

(2)

resolution algorithm for computing rewritings w.r.t. such logics has been proposed in [8]. In contrast, instance queries w.r.t. ontologies expressed in the basic non-Horn DLELU are not generally rewritable into Datalog: a result that holds regardless of any complexity-theoretic assumptions [8]. While FO and Datalog rewritability of instance queries w.r.t. ALC ontologies have been proved to be decidable [4], the decision methods in [4] are problematic in practice as their first step, translation to CSP, has exponential best-case complexity. Finally, for certain classes of conjunctive queries, query answering over non-Horn ontologies can be reduced to knowledge base consistency [13]. This approach, however, does not generally ensure tractability in terms of data complexity.

In this paper, we are interested in ontologies formulated in the basic non-Horn DLELU. For simplicity, we focus on instance queries of the formA(x). In general, however, our results apply toground queries, where all variables are required to be mapped to constants; such queries form the basis of the standard query language SPARQL. Our main result is the identification of two sufficient conditions on ELU-ontologies that ensure FO-rewritability and Datalog rewritability of instance queries, respectively. Furthermore, we provide resolution-based algorithms that can be used for computing the corresponding rewritings in case our sufficient conditions are fulfilled. Our algorithms build on the generic resolution- based technique proposed in [8].

This paper is accompanied with an appendix containing the proofs of our technical results.

2 Preliminaries

We consider First-Order logic without equality and function symbols. Variables, terms, (ground) atoms, literals, formulae, sentences, intepretations, models and entailment are defined as usual. AnABox is a finite set of ground atoms (called facts). We also adopt standard notions of (Horn) clauses, (variable) substitutions, and most general unifiers (MGUs).Positive factoring (PF) andbinary resolution (BR) are as follows, whereσis the MGU of atoms AandB:

PF: C∨A∨B

Cσ∨Aσ BR: C∨A D∨ ¬B (C∨D)σ

A clauseC is atautology if it contains literalsAand¬A. A clause Csubsumes a clause D if a substitution σ exists such that each literal inCσ occurs in D.

Furthermore, C θ-subsumes D if C subsumes D and C has no more literals thanD. ClauseC is redundant in a set of clauses ifC is a tautology or if C is θ-subsumed by another clause in the set. Thecondensation of a clauseC is the clauseD with the least number of literals such thatD⊆CandC subsumesD.

The Description LogicELU.We assume familiarity with standard DLs; here, we briefly recapitulate the syntax ofELU. An ELU-concept is an expression of the form >, A, C1uC2, C1tC2, or ∃R.C, where A is a concept name, C_(i) are concepts, and R is a role name. An ELU-TBox T is a finite set of GCIs of the form C1 v C2 where C1 and C2 are ELU-concepts. An ELU-TBox is

(3)

normalised it it contains only axioms of the form ∃R.A vB, A v ∃R.B, and dn

i=1A_i vFm

j=1B_j, whereA_(i)andB_(j)are either named or>. EachELU-TBox T can be transformed in polynomial time into a normalised ELU-TBox that is a conservative extension ofT. AnELU-TBox T islinear if conjunctionudoes not occur on the left-hand side of a GCI inT.

Rules. A rule is a First-Order sentence of the form ∀x.∀z.[ϕ(x,z) → ψ(x)], where tuples of variablesxandzare disjoint,ϕ(x,z) is a conjunction of atoms, and ψ(x) is a disjunction of atoms. Formula ϕ is the body of r, formula ψ is the head of r, and quantifiers in a rule are omitted for brevity. Furthermore, we often abuse notation and treat a rule and its equivalent clause as synonyms.

A rule is Datalog if ψ(x) is a single atom, and it is disjunctive otherwise. A (Datalog) program is a finite set of (Datalog) rules. A program is monadic if every predicate occurring in the head of a rule is unary.

AnELU-program consists of the following kinds of rules: (i)Vn

i=1Ai(x)→ Wm

j=1B_j(x) and(ii)R(x, y)∧A(y)→B(x), where the atomA(y) can be omitted.

For convenience, we use the unary atom>(x) to denote an empty rule body. An EL-program is anELU-program that is also Datalog. Finally, anELU-program is linear if conjunction does not occur on the left-hand side of a rule of type(i).

Queries.An (instance) query is a unary atomQ(x). A constantc is an answer to Q(x) w.r.t. a set of FO sentences F and an ABoxA ifF ∪ A |=Q(c). The set of answers toQrelative toF andAis denoted ascert(Q(x),F,A).

FO and Datalog Rewritings.For an ABoxA, letI_A the interpretation corresponding toAin the obvious way. An FO formulaϕ(x) with one free variable is an FO-rewriting of a query Q(x) w.r.t. an ELU-TBox (or, equivalently, an ELU-program)T if the following condition holds for every constantcand ABox A: c ∈cert(Q,T,A) iff I_A |= ϕ(c). Furthermore, we say that ϕ(x) is a UCQ- rewriting if it is of the formϕ(x) =Wn

i=1ϕi(x) where eachϕi(x) is constructed using only conjuction and existential quantification. We say that Q(x) is FO- rewritable w.r.t. T if an FO-rewriting of Q(x) w.r.t. T exists. Finally, T is FO-rewritable if for each concept nameAinT the queryA(x) is FO-rewritable w.r.t.T. A Datalog programP is a rewriting of a queryQ(x) relative to a TBox (or, equivalently, anELU-program)T ifcert(Q(x),T,A) =cert(Q(x),P,A) for every ABox A. We say that Q(x) is Datalog-rewritable w.r.t. T if a Datalog rewriting ofQ(x) w.r.t.T exists. Finally,P is a rewriting ofT if it is a rewriting for each queryA(x) w.r.t.T, withAa concept name inT; TBoxT is Datalog- rewritable if a Datalog rewriting ofT exists. As observed in [4], it follows from the homomorphism preservation theorem for finite structures [18] that each FO- rewritable query is also UCQ rewritable and hence Datalog rewritable as well.

3 Computing Rewritings via Resolution

We next recapitulate the generic resolution-based technique proposed in [8], which takes a TBoxT and attempts to rewrite it into a Datalog program. IfT is restricted to be inELU, this technique consists of the following two steps.

(4)

Procedure 1Compile-Horn Input:S: set of clauses

Output:SH: set of Horn clauses

1: SH:={C∈ S |C is a Horn clause and not a tautology}

2: S_H:={C∈ S |C is a non-Horn clause and not a tautology}

3: repeat

4: F:= factors of eachC1∈ S_H non-redundant inSH∪ S_H

5: R: = resolvents of eachC1∈ S_H andC2∈ S_H∪ SHnot redundant inSH∪ S_H 6: for eachC∈ F ∪ Rdo

7: C⁰:= the condensation ofC

8: Delete fromSH andS_H all clausesθ-subsumed byC⁰ 9: if C⁰ is HornthenSH:=SH∪ {C⁰}

10: elseS_H :=S_H∪ {C⁰} 11: untilF ∪ R=∅

12: returnSH

Step 1: From DLs to Disjunctive Datalog. First, the algorithm in [12]

is applied to transform T into an ELU-program Dthat entails the same facts as T w.r.t. every ABox. By eliminating the positive occurrences of existential quantifiers, one thus reduces the problem of rewriting the DL TBox T to the problem of rewriting the disjunctive programD.

Step 2: From Disjunctive Datalog to Datalog. Second, the program D is transformed into a Datalog program P using a variant of the mainstream knowledge compilation algorithms proposed in [19] for propositional clauses, and extended in [9] to First-Order clauses (but without termination guarantees).

Procedure 1 summarises (a slight variation of) the technique from [9]. Roughly speaking, the procedure applies binary resolution and factoring and keeps only the consequences that are not redundant. Unlike unrestricted resolution, the procedure never resolves two Horn clauses. The main property of Procedure 1 shown in [9] is that, even if it never terminates, each Horn consequence of the input S will also eventually become entailed by SH. In [8], it was shown that Procedure 1 also enjoys the following key property: if the programDis given as input and the procedure terminates, then the output P is a Datalog rewriting of D. In essence, this result implies that compilation into Horn clauses can be done in an ABox independent way:DandP can be combined with an arbitrary ABox and still entail the same facts.

Example 3.1 We use theELU-programsD¹ andD² as running examples:

D¹={ C(x)→A(x)∨B(x) D²={ >(x)→A(x)∨B(x) R(x, y)∧A(y)→H(x) R(x, y)∧A(y)→B(x) R(x, y)∧B(y)→D(x) R(x, y)∧B(y)→A(x) } R(x, y)∧D(y)→H(x)}

(5)

When applied to programD¹, Procedure 1 terminates with the following setsD¹

H

andD¹_H of disjunctive and Datalog rules, respectively:

D¹_H={ C(x)→A(x)∨B(x) D_H¹ ={R(x, y)∧A(y)→H(x) R(x, y)∧C(y)→H(x)∨B(y) R(x, y)∧B(y)→D(x) R(x, y)∧C(y)→D(x)∨A(y) R(x, y)∧D(y)→H(x) R(x, y)∧R(z, y)∧C(y)→H(x)∨D(z)} R(x, z)∧R(x, y)∧R(z, y)∧C(y)

→H(x)} The results proved in [8] imply thatD_H¹ is a Datalog rewriting ofD¹. In contrast, Procedure 1 does not terminate when givenD² as input. Although the mutually recursive Datalog rules in D² are never resolved directly with each other, they can interact via the program’s (only) disjunctive rule. As a result, Procedure 1 will eventually derive each of the following (infinitely many) rules, for each even numbern and for eachZ ∈ {A, B}:

R(xn, x0)∧

n

^

i=1

R(xi, x_i−1)→Z(xn)

4 Sufficient Conditions for Rewritability

In what follows, we present two conditions onELU-programs that ensure FO and Datalog-rewritability, respectively. Both conditions come with resolution-based algorithms for computing rewritings. To ensure termination of resolution, both our conditions apply to linearELU-programs only. As we discuss later on in§5, termination in the non-linear case becomes much more problematic, and it is left for future work.

4.1 FO Rewritability

As shown by Bienvenu et al. [3], if an instance queryA(x) is FO-rewritable w.r.t.

anEL-TBox, then there exists a tree-shaped UCQ that is an FO-rewriting for the given query and TBox. This property was critical to the study of FO-rewritability in the context ofELas it implies a characterisation of FO-rewritability in terms of tree-shaped ABoxes. Since only tree-shaped ABoxes are relevant, one can exploit standard tree automata machinery to study the problem.

Once disjunctions come into play, however, this key property seems to break.

As demonstrated by the following example, there are FO-rewritableELU-TBoxes that do not seem to be rewritable into tree-shaped UCQs. Thus, the machinery developed in [3] for ELdoes not seem to extend toELU.

Example 4.1 The queryH(x)is FO-rewritable w.r.t. our example programD¹. The following UCQϕ(x) =ϕ1(x)∨ϕ2(x)∨ϕ3(x)∨ϕ4(x)∨ϕ5(x)can be obtained

(6)

by unfolding the Datalog rewriting obtained by Procedure 1 on D¹:

ϕ₁(x) =H(x) ϕ₄(x) =∃y.∃z. R(x, y)∧R(x, z)∧R(z, y)∧C(y) ϕ2(x) =∃y. R(x, y)∧A(y) ϕ5(x) =∃y.∃z. R(x, y)∧R(y, z)∧B(z)

ϕ3(x) =∃y. R(x, y)∧D(y)

Clearly, ϕ4(x)is not tree-shaped. Moreover, H(x)does not seem to have a tree- shaped UCQ-rewriting as defined in [3].

Although a linear ELU-program D that is FO-rewritable may not have a tree-shaped rewriting, linearity allows us to restrict resolution proofs in a critical way. We can show that each derivation fromDvia resolution and condensation only (i.e., no factoring) involves only tree-shaped rules. This implies that each non-tree rule derived by resolution and factoring is a factor of a tree-shaped rule.

In the following, we define a condition on linearELU programs that guarantees FO-rewritability. When applied to a programDsatisfying the condition (such asD¹in our running example), Procedure 1 will terminate: the size of derived tree-shaped rules will be limited by the size ofD. Furthermore, the Datalog program obtained as output will bebounded for every predicate in the standard sense [14]. Our condition is defined as given next.

Definition 4.2. The dependency graphof a linearELU-program Dis the least directed edge-labeled graph GD= (V, E, µ)satisfying the following conditions:

1. each unary predicate occurring in Dis a node inV;

2. (A, B)∈E∩µfor each rule inD of the formR(x, y)∧A(y)→B(x);

3. (A, B_i)∈Ewithi∈[1, n]for each rule inDof the formA(x)→Wn

i=1B_i(x).

Edges contained in the labelling µare called transfer edges; all remaining edges are called propositional edges. A transfer cycle is a simple cycle that contains at least one transfer edge. The program Dis acyclic ifGD contains no transfer cycle. Finally, the transfer depth of a unary predicate A in an acyclic program Dis the maximal number of transfer edges on a path inG_D ending in A.

Intuitively, the maximal transfer depth of a predicate in an acyclic program establishes a bound on the role depth of the tree-shaped rules that can be generated by Procedure 1. Termination of Procedure 1 is then ensured by condensation, which bounds the branching factor of rules in terms of their depth.

Example 4.3 Consider the program D¹ from Example 3.1. The dependency graph forD¹is given next, where transfer edges are dashed. Clearly,D¹is acyclic and predicate H has transfer depth2.

C A H

B D

(7)

Procedure 2Rewrite-UCQ

Input:D: a linear, acyclicELU-program;

A: a unary predicate occurring inD Output:ϕ(x): a UCQ

1: P:=Compile-Horn(D)

2: P⁰:={A(x)→Q(x)}whereQis a fresh unary predicate 3: P⁰⁰:=∅

4: repeat

5: Select somer∈ P⁰

6: N := the resolvents ofrwith clauses inPthat are not subsumed inP⁰∪ P⁰⁰ 7: P⁰:= (P⁰\ {r})∪ N

8: P⁰⁰:=P⁰⁰∪ {r}

9: untilP⁰=∅ 10: ϕ(x) :=W

{ ∃y. ψ(x,y)| ∀x.∀y. ψ(x,y)→Q(x)∈ P⁰⁰} 11: returnϕ(x)

Rules derived by Procedure 1 onD¹ have depth at most 2 (see Example 3.1).

Procedure 2 computes a UCQ rewriting of a queryA(x) relative to an acyclic programD. On inputDandA, we first apply Procedure 1 to compute a Datalog rewriting P of D; as already seen, termination of this first step is guaranteed.

PredicateAis then unfolded inP using standard techniques, as shown in Lines (2-10); unfolding terminates iff the predicateAis bounded inP [14, 7]. To show termination of unfolding, we observe that the depth of the rules inP still cannot exceed the transfer depth of the predicates in the dependency graph ofD; hence, the loop in Lines (4)-(9) only derives finitely many rules modulo subsumption.

The properties of the procedure are thus as follows.

Theorem 4.4. Given an acyclicELU-programDand a unary predicateAinD, Procedure 2 terminates and returns a UCQ rewriting of A(x) w.r.t.D.

Example 4.5 When applied to the predicate H and the program D¹ from Ex- ample 3.1, Procedure 2 first calls Procedure 1, which returns the program D¹_H from Example 3.1 as a Datalog rewriting of D¹. So, P is initialised with D¹_H and P⁰ is set to {H(x) → Q(x)}. On these values, unfolding terminates after five iterations of the main loop with the following rules in P⁰⁰:

H(x)→Q(x) R(x, z)∧R(x, y)∧R(z, y)∧C(y)→Q(x) R(x, y)∧A(y)→Q(x) R(x, y)∧R(y, z)∧B(z)→Q(x) R(x, y)∧D(y)→Q(x)

This yields precisely the UCQ ϕ(x)given in Example 4.1.

4.2 Datalog Rewritability

Our Datalog-rewritability condition relaxes the acyclicity condition from §4.1 by allowing certain kinds of cyclic programs. Intuitively, we allow programs that

(8)

can be separated into two parts: one that may contain disjunctive rules but no transfer cycles, and another one that contains only Datalog rules, but may contain transfer cycles. We call such programsseparable.

Definition 4.6. Let D be a linear ELU-program and let D_∨ be the smallest subset of Dsuch that

1. each disjunctive rule in Dis contained in D∨; and

2. no unary predicate in D \ D_∨ occurs in the body of a rule in D_∨. Program D is separableif D∨ is acyclic.

Example 4.7 ProgramD² from our running example 3.1 is not FO-rewritable;

however, it is separable: D_∨² ={>(x)→A(x)∨B(x)} and it is therefore acyclic as in Definition 4.2.

We deal with separable programs by eliminating cycles from their Datalog part. Cycle elimination is based on the observation that every linearEL-program P can be seen as a labeled transition systemT_P with unary predicates as states and rules as transitions labeled by binary predicates (or >if the rule contains no binary predicate). Given a unary predicate B in P, an ABox A, and an individualb, every derivation ofB(b) fromP ∪ Acorresponds to a path ending in B inT_P; hence, paths inTP encode resolution proofs.

The set of all possible derivations of an assertion B(b) from an assertion A(a) by the rules inP corresponds to the regular language accepted by the non- deterministic finite automaton constructed from T_P by taking Aas the unique initial state, andB as the unique accepting state.

Definition 4.8. Let D be a linear ELU-program and let hA, Bi be a pair of unary predicates occurring in D. The automaton for D relative to hA, Bi is the non-deterministic finite word automaton Aut_D(A, B) =hS, Γ,→,{A},{B}i defined as follows: the set of states Sconsists of the unary predicates in D; the alphabet Γ is the set of binary predicates in D plus the special symbol >; the (unique) initial and final states are Aand B respectively; finally, the transition relation →consists of the following transitions:

– C→_> D for eachEL-ruleC(x)→D(x)inD;

– C→RD for each EL-rule R(x, y)∧C(y)→D(x)inD.

Finally, we denote withL(AutD(A, B))the language accepted byAutD(A, B).

Example 4.9 The automaton forD² relative to hA, Biis given below:

start A B

R

(9)

The automata forhA, Ai,hB, Ai, andhB, Bicontain exactly the same states and transitions and only differ on the particular choice of initial and final state.

The languageL(AutD(A, B)) is regular, and hence can be described by means of some regular expression over the alphabetΓ of the automaton. We adopt the following definition of regular expressions overΓ, whereα∈Γ and the atomic expression∅ denotes the empty language.

e ::= α| ∅ |ee|e+e|e^∗

With each regular expression e over Γ we uniquely associate a (fresh) binary predicateN_eand a Datalog programPe that definesN_e as shown next.

Definition 4.10. Let Dbe a linearELU-program and let ebe a regular expression over the binary predicates in Dand the symbol >. The Datalog program Pe

corresponding to e is defined inductively as follows:

P_∅ = ∅

P> = {>(x)→N>(x, x)}

PR = {R(x, y)→NR(y, x)}

Pe1e2 = Pe1∪ Pe2∪ {Ne1(x, y)∧N_e₂(y, z)→N_e₁_e₂(x, z)}

Pe₁+e₂ = Pe₁∪ Pe₂∪ {Ne₁(x, y)→Ne₁+e₂(x, y), Ne₂(x, y)→Ne₁+e₂(x, y)}

P_e^∗ = P_e∪ {>(x)→N_e^∗(x, x), N_e(x, y)∧N_e^∗(y, z)→N_e^∗(x, z)}

Example 4.11 The language of the automaton Aut_D2(A, B) in Example 4.9 can be captured by the regular expression e_hA,Bi =R(RR)^∗. The corresponding program P_e_hA,Bi consists of the following Datalog rules:

R(x, y)→NR(y, x) (1)

NR(x, y)∧NR(y, z)→NRR(x, z) (2)

>(x)→N_(RR)∗(x, x) (3) NRR(x, y)∧N_(RR)^∗(y, z)→N_(RR)^∗(x, z) (4) N_R(x, y)∧N_(RR)∗(y, z)→N_R(RR)∗(x, z) (5) The language for Aut_D2(A, A)is captured bye_hA,Ai= (RR)^∗, which is a subex- pression of e_hA,Bi; the programPehA,Ai thus consists of rules (1)-(4). Symmetry in the automaton’s transitions impliesP_e_hB,Ai =P_e_hA,Bi andP_e_hB,Bi =P_e_hA,Ai.

LetD be a separable program and D∨ be the disjunctive part of D. Since for each pair A, B in D \ D∨ we have thatL(AutD(A, B)) can be described by some regular expression e_hA,Bi, all ways in which an assertion A(a) can imply B(b) can be summarised by the single rule N_e_hA,Bi(x, y)∧A(x)→B(y). Thus, the disjunctive program D⁰ defined as the union of D_∨, all programs P_hA,Bi, and all rules N_e_hA,Bi(x, y)∧A(x)→ B(y) has the same Horn consequences as D. To obtain such Horn consequences, we could thus apply Procedure 1 to D⁰; however, to ensure termination we do the following instead:

(10)

Procedure 3Rewrite-Datalog Input:D: a separableELU-program;

Output:P: a Datalog program 1: Ξ(D), Ω(D) :=∅

2: for eachpair of unary predicateshA, Bi do

3: AutD(A, B) := finite word automaton forDrelative tohA, Bi 4: Construct a regular expressionehA,Biequivalent toAutD(A, B) 5: Construct Datalog programPe_hA,Bi and

6: Ξ(D) :=Ξ(D)∪ Pe_hA,Bi

7: Ω(D) :=Ω(D)∪ {Re_hA,Bi∧A(x)→QB(y)}

8: P:=Compile-Horn(D∨∪Ω(D)) 9: P:=P ∪Ξ(D)

10: for each unary predicateAinD, replaceQAwithAinP 11: returnP

– in each ruleNe_hA,Bi(x, y)∧A(x)→B(y) inD⁰, we replace predicateB with a fresh predicateQB, which ensures acyclicity of the resulting program; and – we apply Procedure 1 only to the monadic rules inD⁰, thus ignoring programs

P_hA,Bifor the purpose of resolution.

The algorithm based on these ideas is given by Procedure 3. In Lines (1)-(7), the procedure computes the following intermediate programs, whereA, B range over the unary predicates in the input:

Ω(D) = [

hA,Bi

{Ne_hA,Bi(x, y)∧A(x)→QB(y)} Ξ(D) = [

hA,Bi

Pe_hA,Bi

In Line (8), our algorithm invokes Procedure 1 to compute the Horn consequences ofD∨∪Ω(D). The following lemma establishes termination of this step.

Lemma 4.12. Let Dbe a linear ELU-program that is separable. Then, Proce- dure 1 terminates on D_∨∪Ω(D).

The following lemma shows that no Horn consequence is lost by applying Procedure 1 toD_∨∪Ω(D) only, and appending the non-monadic programΞ(D) only afterwards.

Lemma 4.13. Let Dbe a separableELU-program. Let P be the union of Ξ(D) and the output of Procedure 1 on D∨∪Ω(D). Then, for every query A(x)and every ABox A, we have cert(A(x),D,A) =cert(Q_A(x),P,A).

Finally, in order to obtain a proper rewriting, in Line (10) we eliminate the auxiliary unary predicatesQ_Abefore returning the output.

Theorem 4.14. On input a separable ELU-program D, Procedure 3 returns a Datalog rewriting of D.

(11)

Example 4.15 For our example programD², we have the following:

Ξ(D²) = [

hA,Bi

Pe_hA,Bi =PR(RR)^∗={(1),(2),(3),(4),(5)}

Ω(D²) ={N_R(RR)∗(x, y)∧A(x)→Q_B(y); N_R(RR)∗(x, y)∧B(x)→Q_A(y);

N(RR)^∗(x, y)∧A(x)→QA(y); N(RR)^∗(x, y)∧B(x)→QB(y)}

When applied to D²_∨∪Ω(D²), Procedure 1 computes the following additional Datalog rules:

N_R(RR)^∗(x, y)∧N_(RR)^∗(x, y)→QA(y) (6) N_R(RR)∗(x, y)∧N_(RR)∗(x, y)→Q_B(y) (7) Procedure 3 finally returns a Datalog program P consisting of rules Ξ(D²), Ω(D²), as well as rules (6)-(7) with predicates QA and QB replaced with A andB, respectively. Program P is a rewriting ofD².

5 Discussion and Future Work

In this paper, we have presented two sufficient conditions for FO and Datalog rewritability of instance queries w.r.t.ELU-ontologies. Our results are still rather preliminary, and we are currently working on extending them in several ways.

First, we are confident that our sufficient conditions can be extended to cover also non-linearELU-programs, as well as more expressive DLs such asALCHI.

Devising a resolution-based rewriting algorithm in such extended setting becomes, however, a much more challenging task.

Example 5.1 Consider the nonlinearELU-programD³with the following rules:

>(x)→A(x)∨B(x) R(x, y)∧A(y)→D(x) B(x)→A(x) D(x)→H1(x)∨H2(x) D(x)∧E(x)→H(x) D(x)→E(x) This program is FO-rewritable: the non-linear rule D(x)∧E(x)→H(x)can be replaced withD(x)→H(x)to obtain an equivalent linear program that satisfies our sufficient condition in Section 4.1. Procedure 1, however, does not terminate when given D³ as input; in particular, Procedure 1 will compute the following infinite family of disjunctive rules for each n >1:

" _n

^

k=1

R(y_k, x_k)∧R(y_k−1, x_k)

#

∧D(y_n)→

" _n _

k=1

H(y_k)

#

∨H₁(y₀)∨H₂(y₀)

Second, it would be interesting to devise a resolution-based decision procedure for FO and Datalog rewritability for ALC ontologies; this is possible, since de- cidability has been recently shown. Finally, we are planning to implement our resolution algorithms and test them in practice.

(12)

References

1. Acciarri, A., Calvanese, D., Giacomo, G.D., Lembo, D., Lenzerini, M., Palmieri, M., Rosati, R.: QuOnto: Querying ontologies. In: AAAI. pp. 1670–1671 (2005) 2. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family

and relations. J. Artif. Intell. Res. 36, 1–69 (2009)

3. Bienvenu, M., Lutz, C., Wolter, F.: Deciding FO-rewritability in EL. In: DL. CEUR Workshop Proceedings, vol. 846 (2012)

4. Bienvenu, M., ten Cate, B., Lutz, C., Wolter, F.: Ontology-based data access: A study through disjunctive Datalog, CSP, and MMSNP. In: ACM PODS (2013), arXiv:1301.6479

5. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family.

J. Autom. Reasoning 39(3), 385–429 (2007)

6. Chortaras, A., Trivela, D., Stamou, G.: Optimized query rewriting in OWL 2 QL.

In: CADE. pp. 192–206 (2011)

7. Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimiza- tion problems for database logic programs: Preliminary report. In: Simon, J. (ed.) STOC ’88. pp. 477–490. ACM (1988)

8. Cuenca Grau, B., Motik, B., Stoilos, G., Horrocks, I.: Computing datalog rewritings beyond Horn ontologies. In: IJCAI (2013), arXiv:1304.1402

9. Del Val, A.: First order LUB approximations: Characterization and algorithms.

Artif. Intell. 162(1-2), 7–48 (2005)

10. Eiter, T., Ortiz, M., ˇSimkus, M., Tran, T.K., Xiao, G.: Query rewriting for Horn- SHIQ plus rules. In: AAAI. pp. 726–733 (2012)

11. Hustadt, U., Motik, B., Sattler, U.: Data complexity of reasoning in very expressive description logics. In: IJCAI. pp. 466–471 (2005)

12. Hustadt, U., Motik, B., Sattler, U.: Reasoning in Description Logics by a Reduction to Disjunctive Datalog. J. Autom. Reasoning 39(3), 351–384 (2007)

13. Kikot, S., Zolin, E.: Modal definability of first-order formulas with free variables and query answering. J. Appl. Log. 11(2), 190–216 (2013)

14. Naughton, J.F.: Data independent recursion in deductive databases. In: Silber- schatz, A. (ed.) PODS ’86. pp. 267–279. ACM (1986)

15. P´erez-Urbina, H., Motik, B., Horrocks, I.: Tractable query answering and rewriting under description logic constraints. J. Appl. Log. 8(2), 186–209 (2010)

16. Rodriguez-Muro, M., Calvanese, D.: High performance query answering over DL- Lite ontologies. In: KR. pp. 308–318 (2012)

17. Rosati, R., Almatelli, A.: Improving query answering over DL-Lite ontologies. In:

KR. pp. 290–300 (2010)

18. Rossman, B.: Homomorphism preservation theorems. J. ACM 55(3) (2008) 19. Selman, B., Kautz, H.: Knowledge compilation and theory approximation. J. ACM

43(2), 193–224 (1996)

20. Stocker, M., Smith, M.: Owlgres: A scalable OWL reasoner. In: OWLED. CEUR Workshop Proceedings, vol. 432 (2008)

(13)

A Proofs for Section 4.1 (FO Rewritability)

Given a disjunctive programD, we writeΣ(D) for the set of all predicate symbols in D. We call Σ(D) the signature ofD. If D is anELU-program, we partition Σ(D) into Σc(D), the set of all unary predicates (concept names) in D, and Σr(D), the set of all binary predicates (role names) in D.

We writeCloR(D) for the set of all rules derivable from Dby binary resolution, CloF(D) for the set of all rules derivable from D by positive factoring, andCloRF(D) for the set of all rules derivable fromDby binary resolution and positive factoring.

Letxbe a sequence of variables and letϕbe a formula. We writex|^y_z andϕ|^y_z for the result of substitutingyforzinxandϕ, respectively. We write FV(ϕ) for the set of variables that occur free inϕ. Since we only consider formulas that are conjunctions or disjunctions of atoms, we can also view them as sets of atoms, writingA ∈ϕ for “A occurs in ϕ”. The notation A∈ rfor a rule r is defined analogously.

Proposition A.1. Let Dbe a linearELU-program. Every rule in CloR(D)has the form A(x0)∧ϕ → Wn

i=1Bi(xi) where ϕ is a conjunction of binary atoms such that {x0, . . . , xn} ⊆FV(ϕ).

Given a ruler of the formA(x0)∧ϕ→Wn

i=1Bi(xi), whereϕis a conjunction of binary atoms such that{x0, . . . , xn} ⊆FV(ϕ), we define:

Gr:= (FV(ϕ),{(x, y)| ∃R:R(y, x)∈ϕ}) We callrtree-shaped if the following conditions are satisfied:

1. G_r is a tree withx₀ as its root andx₁, . . . , x_n as its leaves.

2. If{R1(x, y), R2(x, y)} ⊆ϕ, thenR1=R2. Thedepth ofris defined as the depth ofGr.

Lemma A.2. Let D be a linearELU-program. Every rule inClo_R(D) is tree- shaped.

Proof. By induction on the derivation ofr∈CloR(D). Clearly, all rules inDare tree-shaped. So, w.l.o.g., let rbe a resolvent or r1 =A(x)∧ϕ1→Wm

i=1Bi(xi) and r2 =B1(y)∧ϕ2 →Wn

i=1Ci(yi) onB1(x1) and B1(y), respectively. By the inductive hypothesis,r1 and r2 are tree-shaped. Since FV(ϕ1) and FV(ϕ2) are assumed to be disjoint and the tree corresponding tor2is rooted aty, the graph (FV(ϕ1∧(ϕ2|^x_y¹)),{(x, y)| ∃R: R(y, x)∈ ϕ1∧(ϕ2|^x_y¹)}) is a tree rooted at x that hasx2, . . . , xm, y1, . . . , yn,as its leaves. Moreover, sincey has no incoming edge in Gr₂, we have R(x1, z)∈ ϕ1∧(ϕ2|^x_y¹) if and only if R(x1, z)∈ ϕ1 (for

every variablez). The claim follows. ut

Lemma A.3. Let Dbe a linear, acyclicELU-program and let nbe the maximal transfer depth of a unary predicate in D. Then each rule in CloR(D)has depth at most n.

(14)

Proof. Letr=A(x)∧ϕ→Wn

i=1B_i(x_i)∈Clo_R(D) By Lemma A.2, it suffices to show that, for every i∈[1, n], the length of the (unique) path from xto x_i in G_r is equal to the difference between the transfer depth of A and the transfer depth ofBi. This follows by a straightforward induction on the derivation ofr

fromD. ut

Definition A.4 (n-Simulation).Let r=A(x₀)∧ϕ→ψbe a tree-shaped rule.

We define an inductive family of preorders (ⁿ_r)_n∈_N between variables in r as follows:

x⁰_ry :⇔ {B|B(x)∈ψ} ⊆ {B|B(y)∈ψ}

xⁿ⁺¹_r y :⇔ x⁰_ry and for every R and x⁰ such that R(x⁰, x)∈ϕ there is some y⁰ such that R(y⁰, y)∈ϕand x⁰ⁿ_r y⁰ We say x is n-simulated by y in r if xⁿ_r y. We define [x]ⁿ_r := {y | x ⁿ_r y and yⁿ_r x}.

Lemma A.5. Let rbe a tree-shaped rule of depth mand let x, y, zbe variables such that, for some R,R(x, z), R(y, z)∈r and xⁿ_r y for some n≥m. There is a substitution σsuch that xσ=y,rσis tree-shaped, and rσ⊆r.

Proof. The claim follows if we can show that for every tree-shaped rule r of depth m and every pair x, y such that x^m_r y, there is a substitution σ such thatxσ=y and for every pairu, v of variables in the subtree ofGrrooted atx we have:

1. uσ, vσare variables in the subtree ofGr rooted aty.

2. For every variable unot in the subtree of Gr rooted atx: uσ=u.

3. For every unary predicateB: ifB(u)∈r, thenB(u)σ∈r.

4. For every binary predicate R⁰: if R⁰(u, v)∈r, thenR⁰(u, v)σ∈r.

We prove the claim by induction on m. If m = 0, xand y have no successors in G_r. Therefore, it suffices to show (1-3). Let σ map xto y and every other variable inr to itself. Thus, (1) and (2) are trivial. For (3), observe that, since x⁰_ry, we haveB(x)σ=B(y)∈rwheneverB(x)∈r.

Now supposem >0. LetR₁, . . . , R_n,x₁, . . . , x_n be all the predicate symbols and variables such that Ri(xi, x) ∈ r (i ∈ [1, n]). Since x ^m_r y, there are y1, . . . , yn such thatRi(yi, y)∈ r andx^m−1_r y. By the inductive hypothesis, there are substitutionsσ1, . . . , σn such that, for every i∈[1, n]: xiσi =yi and σi satisfies (1-4) for the subtree ofGr rooted atxi. Letσbe defined such that:

– xσ=y.

– For everyi∈[1, n] and every variableuin the subtree ofxi: uσ=uσi. – For every other variableu: uσ=u.

Note that σis well-defined since the subtrees rooted at x₁, . . . , x_n are pairwise disjoint and do not contain x (which, in turn, holds since r is tree-shaped).

Clearly,σ satisfies (1) and (2). For the subtrees ofG_r rooted atx₁, . . . , x_n, (3) and (4) hold since σ extends σ1, . . . , σn. For x, (3) holds since x⁰_r y, and (4) follows since we haveRi(x, xi)σ=Ri(y, yi) for everyi∈[1, n]. ut

(15)

Proposition A.6. Let Dbe a monadic disjunctive program and let f :N→N be a function defined as follows:

f(0) := 2^|Σ^c^(D)|

f(n+ 1) := 2^|Σ^c^(D)|· |Σr(D)| ·2^f(n)

Then [x]ⁿ_r ≤f(n)for every rule rover Σ(D)and every variable xin r.

A rule r is condensed if r has no condensation that is smaller than r. By Proposition A.6 and Lemma A.5 we obtain:

Lemma A.7. Let D be an ELU-program. The size of condensed tree-shaped rules of depth n over Σ(D)is bounded in n and |Σ(D)|.

Proof. Let D be an ELU-program and let f be defined as in Proposition A.6.

It suffices to show that the number of variables in every condensed tree-shaped rule of depthnis bounded byn·(|Σr(D)| ·f(n))ⁿ.

Note that every treeT of depthncontains at most n·mⁿ nodes, where m is the maximal outdegree of a node inT. Therefore, every tree of depthnthat hasknodes must contain a node that has outdegree at least pⁿ

k/n.

Suppose, for contradiction,ris a condensed tree-shaped rule of depthnthat mentions more thann·(|Σr(D)| ·f(n))ⁿ variables. By the above observation,Gr

must have a nodez that has outdegree greater|Σr(D)| ·f(n). Consequently, by Proposition A.6, there is some predicateR and two distinct variables x, ysuch thatR(x, z), R(y, z)∈rand [x]ⁿ_r = [y]ⁿ_r. By Lemma A.5, there is a substitution σsuch thatxσ=yandrσ⊆r. Sincexandyare distinct, it follows thatrσ(r, contradicting the assumption thatris condensed. ut Lemma A.8. Every condensation of a tree-shaped rule is tree-shaped.

Proof. Let r be a tree-shaped rule and rσ a condensation of r. Since rσ is a subclause of r, Grσ must be a forest, the root of Gr must be a source inGrσ, and all leaves ofGrmust be sinks inGrσ. Sincerσis a substitution instance ofr andGris connected, so must beGrσ. Hence,Grσis a tree. The claim follows. ut Lemma A.9. Unrestricted binary resolution with condensation terminates on linear, acyclicELU-programs. Moreover, the size of every rule derivable by binary resolution with condensation from an acyclic ELU-program Dis bounded in the size of D.

Proof. LetDbe a linear, acyclicELU-program. By Lemma A.2, Lemma A.5, and Lemma A.8, every rule derivable by resolution with condensation is tree-shaped.

By Lemma A.3 and Lemma A.7, the size and hence the number of condensed tree-shaped rules is bounded in|Σ(D)|. The claims follow. ut Lemma A.10 ([8] Theorem 12, Lemma 19). Let D be a disjunctive program, let A be an ABox, and let C be a Horn clause such that D ∪ A |= C.

Let D_Hⁱ be the Datalog program computed by Procedure 1 afteriiterations of the main loop. Then there is some nsuch that Dⁿ_H∪ A |=C.

(16)

Theorem A.11. Let D be a linear, acyclic ELU-program. Procedure 1 terminates on Dand returns a Datalog rewriting ofD.

Proof. To show termination of Procedure 1, it suffices to bound the size of every rule derived from D by resolution with condensation and factoring. By Lemma A.9, the size of every rule derived from D by resolution with condensation but no factoring is bounded in|Σ(D)|. Thus, it suffices to show that for every rulerobtained fromDby resolution with condensation and factoring there is a (not necessarily strictly) larger ruler⁰ obtained from Dby resolution with condensation but without factoring such thatris subsumed byr⁰ (where clauses are viewed as sets of atoms). This follows by a straightforward induction on the derivation ofr.

By Lemma A.10, the output of Procedure 1 is a Datalog rewriting ofD. ut LetDbe a linear, acyclicELU-program and letr=A(x)∧ϕ→Wn

i=1Bi(xi)∈ CloR(D). The proof of Lemma A.3 exhibits that the length of the path fromx to xi in Gr is equal to the difference between the transfer depth ofA and the transfer depth of Bi. In particular, if Bi = Bj for some 1 ≤ i < j ≤ n, then xi and j must have the same distance fromxin Gr. Thus, while factoring can destroy the tree structure ofGr, it does not change the distance betweenxand xi inGr. The same is true for condensation. Thus, we have:

Proposition A.12. Let Dbe a linear, acyclicELU-program, let Pbe a Datalog rewriting of Dcomputed by Procedure 1, and let r=A(x)∧ϕ→Wn

i=1B_i(x_i)∈ P. Then for every i∈[1, n], the distance between xand xi in Gr is equal to the difference between the transfer depth of Aand the transfer depth ofBi. Thus, given anELU-programD, its Datalog rewriting P, and a ruler∈ P, the depth ofGris still bounded by the maximal transfer depth of a predicate inD and cannot be further increased by resolution.

Theorem 4.4. Given an acyclicELU-programDand a unary predicateAinD, Procedure 2 terminates and returns a UCQ rewriting of A(x) w.r.t.D.

Proof. By Theorem A.11, the call to Procedure 1 (Compile-Horn) terminates and returns a Datalog rewriting of D. It is easily seen that the main loop of the procedure computes the A-expansions of D as defined in [7] (similarly to the algorithm in [14]). Therefore, the procedure terminates and returns a UCQ- rewriting of A(x) w.r.t. Dprovided the resolution strategy implemented in the main loop terminates.

Suppose, for contradiction, this is not the case. Then for everyn ∈ N, P⁰⁰ must eventually contain a rule rsuch thatG_r contains more thann nodes and ris not subsumed by any clause previously added to P⁰⁰. By Proposition A.12, the depth of every rule inClo_R(P ∪ {A(x)→Q(x)}) is bounded inD. So, letd be the corresponding bound. It follows that G_r must contain a variable xwith outdegree at least p^d

n/d. For large enough n, this means there is a ruler∈ P and rulesr1=B(y)∧ϕ1→Q(x),r2=B(y)∧ϕ2→Q(x)∈ P⁰,r⁰₁,r⁰₂such that r⁰₁ is a resolvent of r with r1, r2 is derived from r⁰₁ (and hence ϕ1 ⊆ ϕ2), and

(17)

r⁰₂ is a resolvent of r with r₂. It is easily seen that r₁⁰ subsumes r₂⁰. Moreover, sincer⁰₂ is derived fromr₁⁰,r₁⁰ is added toP⁰⁰beforer₂⁰, in contradiction to our

assumption. ut

B Proofs for Section 4.2 (Datalog Rewritability)

Given a separableELU-programD, we writeD∨ for the fragment ofDas used in Definition 4.6 andD∨¯ forD \ D∨.

Lemma 4.12. Let Dbe a linear ELU-program that is separable. Then, Proce- dure 1 terminates on D_∨∪Ω(D).

Proof. The claim holds sinceD_∨∪Ω(D) is acyclic, which is the case sinceD_∨ is acyclic and every rule added byΩ(D) leads to a sink. ut For the proof of Lemma 4.13, we need to generalise the notion of annotations from how they are defined in §4.2. Letx, y, z be finite sequences of variables.

Annotations are now triples of the form (x, S,y), where S is a set of binary atoms whose free variables are contained in x,y,z. Let α = (x, S,y) be an annotation and let A=A1, . . . , A_|x|, andB=B1, . . . , B_|y| be finite sequences of unary predicates. We define the following notation for disjunctive rules:

AαB:=∀xyz.





|x|

^

i=1

Ai(xi)



∧ ^

C∈S

C

!

→

|y|

_

j=1

Bj(yj)

For linear rules, the notation reduces to AαB, while for Datalog rules, we have AαB, whereA, B are single predicate symbols. IfAαB is anELU-rule,αmust be of the form (x,{R(y, x)}, y) or (x,∅, x . . . x

| {z }

n

) for somen≥1.

Letα= (x, S,y),β= (x⁰, S⁰,y⁰) be annotations such that FV(α)∩FV(β) =

∅, and leti∈[1,|y|],j∈[1,|x⁰|]. We define:

α◦ⁱ_jβ:= (xx⁰₁. . . x⁰_j−1x⁰_j+1. . . x⁰_|x0|, S∪(S⁰|^y_xⁱ0 j

), y₁. . . y_i−1y_i+1. . . y_|y|(y⁰|^y_xⁱ0 j

)) We callα◦ⁱ_jβthecompositionofαat positionitoβat positionj. We abbreviate α◦¹_jβ asα◦_jβ, α◦ⁱ₁β as α◦ⁱβ, and α◦¹₁β as α◦β.

Proposition B.1. Let α= (xα, Sα,yα), β = (xβ, Sβ,yβ), γ = (xγ, Sγ,yγ) be annotations and let i, i⁰∈[1,|yα|],j∈[1,|xβ|],k∈[1,|yβ|],l∈[1,|xγ|]. Then:

1. α◦ⁱ_j(β◦^k_l γ) = (α◦ⁱ_jβ)◦^k+|y_l ^α^|−1γ.

2. Ifi < i⁰, then(α◦ⁱ_jβ)◦ⁱ_l⁰⁻¹γ≡_a (α◦ⁱ_l⁰γ)◦ⁱ_jβ.

Letr =AαB, r⁰ = A⁰α⁰B⁰ be two rules such that Bi =A⁰_j and FV(α)∩ FV(β) =∅. We writer+ⁱ_jr⁰ for the resolvent of randr⁰ onB_i(y_i) andA⁰_j(x⁰_j), respectively. We abbreviater+¹_j r⁰ asr+_jr⁰,r+ⁱ₁r⁰ as r+ⁱr⁰, andr+¹₁r⁰ as r+r⁰. Resolution naturally corresponds to composition of annotations:

(18)

Proposition B.2. Let r=AαB,r⁰=A⁰α⁰B⁰be linear rules such thatB_i=A⁰ and FV(α)∩FV(α⁰) =∅. Then:

1. r+ⁱr⁰=A(α◦ⁱα⁰)B₁. . . B_i−1B_i+1. . . B_|B|B⁰. 2. If r is Datalog, then r+r⁰=A(α◦α⁰)B⁰.

We consider two rules r, r⁰ to be the same (and write r =r⁰) if they are identical up to renaming of bound variables and reordering of literals. The corresponding equivalence on annotations can be defined as the least equivalence relation≡_a such that:

1. (x₁. . . x_m, S, y₁. . . y_n)≡a(x₁. . . x_i−1x_jx_i+1. . . x_j−1x_ix_j+1. . . x_m, S, y₁. . . y_k−1y_ly_k+1. . . y_l−1y_ky_l+1. . . y_n) for all 1≤i≤j≤mand 1≤k≤l≤n,

2. (x, S,y)≡a(x|û_v, S|û_v,y|û_v) for everyu /∈x∪y∪FV(S).

Proposition B.3. Two rules of the form AαB, AβB are identical up to renaming of bound variables and reordering of literals if and only if α≡aβ.

We assume ◦ⁱ_j and +ⁱ_j to be left-associative, writing r1 +ⁱ_j¹

1 r2 +ⁱ_j²

2 r3 for (r₁+ⁱ_j¹

1 r₂) +ⁱ_j²

2 r₃ and α₁◦ⁱ_j¹

1α₂◦ⁱ_j²

2 α₃ for (α₁◦ⁱ_j¹

1α₂)◦ⁱ_j²

2 α₃. As corollaries of Proposition B.1, we obtain:

Corollary B.4. Let r=AαB, r⁰,r⁰⁰ be linear rules and i, j indices such that r+ⁱ(r⁰+^jr⁰⁰)is a resolvent of r,r⁰ and r⁰⁰. Then:

1. r+ⁱ(r⁰+^jr⁰⁰) =r+ⁱr⁰+^j+|B|−1r⁰⁰.

2. If r is Datalog, thenr+ (r⁰+^jr⁰⁰) =r+r⁰+^jr⁰⁰.

Corollary B.5. Let r=AαB, r⁰, r⁰⁰ be linear rules and i, j indices such that i < j and r+^jr⁰⁰+ⁱr⁰ is a resolvent of r,r⁰, and r⁰⁰. Then r+ⁱr⁰+^j−1r⁰⁰= r+^jr⁰⁰+ⁱr⁰.

Lemma B.6. Let Dbe a separable ELU-program, let P be a Datalog program such that Σ(P)∩Σ_c(D) =∅ and no rule inP resolves with a rule in D∨, and let r∈Clo_R(D ∪ P)be linear. Then eitherr∈Clo_R(D∨¯∪ P)or, for some n≥0, we have r=r_∨+r₁+. . .+r_n where:

– r_∨=AαB∈Clo_R(D∨)where |B| ≥n.

– {r1, . . . , rn} ⊆CloR(D∨¯∪ P).

Proof. Letr∈CloR(D ∪ P)\CloR(D∨¯∪ P) we show thatrcan be decomposed as claimed by induction on the derivation of r ∈ Clo_R(D ∪ P). If r ∈ D, the claim is true forn= 0. Otherwise, letr=r⁰+ⁱ_jr⁰⁰. Sincer /∈Clo_R(D∨¯∪ P), we have {r⁰, r⁰⁰} 6⊆Clo_R(D∨¯ ∪ P). Therefore, since no rule in P resolves with D∨

and no unary predicate inD∨¯ occurs in the body of a rule inD_∨, we must have r⁰ ∈Clo_R(D ∪ P)\Clo_R(D∨¯∪ P). Hence, by the inductive hypothesis, we have r⁰ = r_∨⁰ +r⁰₁+. . .+r⁰_m where r⁰_∨ = A⁰α⁰B⁰ ∈ Clo_R(D_∨) and {r⁰₁, . . . , r_m⁰ } ⊆ CloR(D∨¯∪ P). Sincer is linear andΣ(P)∩Σ(D_∨) =∅, r⁰⁰must be linear, and hencej= 1. We distinguish two cases:

(19)

1. r⁰⁰∈Clo_R(D∨¯∪P). Then eitheri∈[m+1, n] ori∈[1, m]. In the former case, the claim is immediate. In the latter case, the claim follows by Corollary B.4.

2. r⁰⁰ ∈ CloR(D ∪ P)\CloR(D∨¯ ∪ P). Then r⁰⁰ = r⁰⁰_∨+r⁰⁰₁ +. . .+r⁰⁰_k where r_∨⁰⁰ = A⁰⁰α⁰⁰B⁰⁰ ∈ CloR(D_∨) and {r⁰⁰₁, . . . , r⁰⁰_k} ⊆ CloR(D∨¯ ∪ P). Moreover, n=m+k an|B|=|B⁰|+|B⁰⁰|. Since no unary predicate inD∨¯ occurs in the body of a rule inD_∨, we havei∈[m+ 1, n]. W.l.o.g., let i= 1 (we can always achieve this by reordering the literals inr⁰⁰). Then

r=r_∨⁰ +r⁰₁+. . .+r⁰_m+ (r⁰⁰_∨+r⁰⁰₁+. . .+r_k⁰⁰)

=r_∨⁰ +r⁰₁+. . .+r⁰_m+r⁰⁰_∨+^|B⁰^|r⁰⁰₁+^|B⁰^|. . .+^|B⁰^|r⁰⁰_k

= (r⁰_∨+^m+1r_∨⁰⁰) +r⁰₁+. . .+r⁰_m+^|B⁰^|r₁⁰⁰+^|B⁰^|. . .+^|B⁰^|r_k⁰⁰

by Corollary B.4 and Corollary B.5. Therefore, we haver=r_∨+r⁰₁+. . .+ r_m⁰ +r⁰⁰₁+. . .+r⁰⁰_k wherer_∨is obtained fromr_∨⁰ +^m+1r⁰⁰_∨by moving the last

|B⁰⁰|literals right after the firstmliterals. ut As shown in §4.2, every linear EL-program P can be seen as a labeled transition system (i.e., automaton without initial and final states). Every rule AαB ∈ P is seen as a transition fromA to B labeled withα. This view yields a natural notion of apath between two unary predicates inP as a sequence of

“connected” rules.

Proposition B.7. Let P be a linear EL-program and let r be a rule. Then r∈CloR(P)iff there is a path r1, . . . , rn (n≥1) in P such thatr=r1+. . .+rn. Corollary B.8. Let P be a linear EL-program and let r = AαB be a rule where A 6= B. Then r ∈ Clo_R(P) if and only if there is a word α₁. . . α_n ∈ L(AutP(A, B))such that α=α₁◦. . .◦α_n.

Given an annotation α = (u, S, v) and variables x, y /∈ FV(S), we write α(x, y) for the formulaV

C∈S(C|^xy_uv). If S is empty, the notation is only defined ifx=y and stands for>(x).

With this, we can generalise the definition of the Datalog programP_efor a regular expressioneover annotations (see§4.2) as follows:

P_∅ = ∅

Pα = {α(x, y)→Nα(x, y)}

Pe1e2 = Pe1∪ Pe2∪ {Ne1(x, y)∧Ne2(y, z)→Ne1e2(x, z)}

Pe₁+e₂ = Pe₁∪ Pe₂∪ {Ne₁(x, y)→Ne₁+e₂(x, y), Ne₂(x, y)→Ne₁+e₂(x, y)}

Pe^∗ = Pe∪ {>(x)→N_e^∗(x, x), N_e(x, y)∧N_e^∗(y, z)→N_e^∗(x, z)}

Proposition B.9. Let P be a linear EL-program, let ebe a regular expression over annotations in P, and let L(e)be the language represented by e. Then for every sequenceα1, . . . , αn (n≥1) of annotations in P:

α₁. . . α_n∈ L(e)iff (α₁◦. . .◦α_n)(x, y)→N_e(x, y)∈Clo_R(P_e)