Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting

(1)

Worst-Case Optimal Querying of Very Expressive Description Logics with Path Expressions and Succinct Counting

Bartosz Bednarczyk^1,2?and Sebastian Rudolph^2??

1 Institute of Computer Science, University of Wrocław, Poland

2 Computational Logic Group, TU Dresden, Germany firstname.lastname@tu-dresden.de

Abstract. Among the most expressive knowledge representation formalisms are the description logics of theZfamily. For well-behaved fragments ofZOIQ, entailment of positive two-way regular path queries is well known to be 2EXPTIME- complete under the proviso of unary encoding of numbers in cardinality con- straints. We show that this assumption can be dropped without an increase in complexity and EXPTIME-completeness can be achieved when bounding the number of query atoms, using a novel reduction from query entailment to knowledge base satisfiability. These findings allow to strengthen other results regarding query entailment and query containment problems in very expressive description logics.

Our results also carry over toGC², the two-variable guarded fragment of first- order logic with counting quantifiers, for which hitherto only conjunctive query entailment has been investigated.

1 Introduction

Recent years have seen a convergence of the fields of logic-based knowledge representation (KR) and databases, with querying over knowledge bases (KBs, aka ontologies) as the archetypical inferencing task considered. Thereby, the query languages of interest have evolved from plain conjunctive queries to more expressive formalisms with path- navigational components such as thepositive two-way regular path queries(P2RPQs) considered here. One popular and practically used way of specifying the to-be-queried knowledge bases is viadescription logics(DLs), whose most expressive exemplars fea- ture advanced constructors for roles and path expressions. Certainly, the DLs from the Zfamily [6] are among the most powerful KR formalisms on the verge of decidability.

For its most expressive proponent,ZOIQ, featuring nominals (O), role inverses (I), and number restrictions (Q), querying is undecidable [8,13] and even decidability of KB satisfiability is open, owing to the intricate interplay of the three mentioned features, but restricting the interaction ofO,I, andQ(or excluding one of the features altogether) leads to beneficial model-theoretic properties, which give rise to upper bounds of EXP- TIME for KB satisfiability and 2EXPTIME for querying, established using elaborate automata techniques [6]. While the first bound holds under the assumption of binary

?Supported by “Diamentowy Grant” no. DI2017 006447

??Supported by ERC Consolidator Grant 771779 (DeciGUT)

(2)

encoding of number restrictions, the proof for the second one requires unary encoding.

While unary encoding of numbers is often silently accepted in the DL community, it is considered “cheating” in others.

In our paper, we overcome this restriction. Leveraging said model-theoretic property of the considered DLs, we provide a novel reduction from the query entailment problem K |=qto the problem of checking unsatisfiability of some other knowledge baseK¬q. Thereby, the size of the latter is exponential in the total size ofqandK(even when assuming binary encoding), leading to a 2EXPTIME-complete complexity. Our technique also yields the new insight that the complexity drops to EXPTIME, when the number of atoms inqis bounded by a constant. The obtained results allow to correspondingly im- prove a number of previous results on query containment and can be transferred to DLs from theSRfamily. Reaching out to the community researching decidable fragments of first-order logic, we show that our results also extend to entailment ofP2RPQs by GC²databases, i.e. sets of unary and binary ground facts in the presence of a sentence of the guarded two-variable fragment with counting as defined by Pratt-Hartmann [10].

2 Preliminaries

Description Logics. We will very briefly recap the very expressive DLZOIQand its relevant sublogics following Calvaneseet al.[6]. We assume a fixed signature consisting of setsN_Iofindividual names,N_Cofconcept names, andN_Rofrole names. We letN_Ccontain the special concept names>and⊥, andN_Rthe special role names>

and^⊥. The following EBNF grammar definesatomic conceptsB,conceptsC,atomic rolesR,simple rolesSandrolesT, witho, a, b∈NI,A∈NC,P ∈NR\ {>}:

B::=A| {o}

C::=B| ¬C|CuC|CtC| ∀T.C| ∃T.C|>nS.C|6nS.C| ∃S.Self R::=P |P⁻

S::=R|S∩S|S∪S|S\S

T ::=>|S|T∪T |T◦T |T^∗|id(C)

Anassertionis of the formC(a),R(a, b),a≈b, ora6≈b, ageneral concept inclusion (GCI) has the formC₁vC₂. We useC₁≡C₂as abbreviation forC₁vC₂,C₂vC₁. AZOIQknowledge base(short: KB)K = (A,T)consists of a finite setA(called ABox) of assertions and a finite setT (calledTBox) of GCIs.

The semantics ofZOIQis defined via interpretationsI = (∆Î,·Î)composed of a non-empty set ∆Î called thedomain of I and a function·Î mapping individual names to elements of∆Î, concept names to subsets of∆Î, and role names to subsets of∆Î×∆Î. This mapping is extended to concepts, simple roles, and roles (cf. Table 1).

Then, a GCIC₁ vC₂issatisfiedbyIifC₁^I ⊆C₂^I, satisfaction of ABox assertions is defined in the obvious way. We say that an interpretationIsatisfiesa KBK = (A,T) (or I is amodelof K, written:I |= K) if it satisfies all axioms of A andT.K is calledsatisfiableif it has a model andunsatisfiableotherwise. FromZOIQ, we obtain ZIQby disallowing nominal concepts{o},ZOQ, by disallowing role inverses(.)⁻ andZOI by disallowing number restrictions.

(3)

Table 1: Semantics of concepts (left) and (simple) roles (right) inZOIQ. Name Syntax Semantics

top > ∆^I

bottom ⊥ ∅

nominal {o} {o^I}

negation ¬C ∆Î\CÎ intersection C1uC2 CÎ₁∩C₂Î union C1tC2 CÎ₁∪C₂Î

univ. restriction ∀T.C {x| ∀y.(x, y)∈TÎ→y∈CÎ} exist. restriction ∃T.C {x| ∃y.(x, y)∈TÎ∧y∈CÎ} qualified number 6nS.C {x|#{y∈CÎ|(x, y)∈SÎ} ≤n}

restriction >nS.C {x|#{y∈C^I|(x, y)∈S^I} ≥n}

Selfconcept ∃S.Self {x|(x, x)∈S^I}

Name Syntax Semantics universal role > ∆^I×∆^I bottom role ^⊥ ∅

inverse role P⁻ {(x, y)|(y, x)∈PÎ} intersection S1∩S2 SÎ1∩S2Î

union S1∪S2 SÎ₁∪S₂Î difference S1\S2 SÎ₁\S₂Î concatenation T1◦T2 T1Î◦T2Î

Kleene star T^∗ S

i≥0(T^I)ⁱ concept test id(C) {(x, x)|x∈C^I}

Queries. In queries, we use variables from a countably infinite set V. A Boolean positive two-way regular path query(P2RPQ) is an expressionq=∃x.ϕ, whereϕis a positive formula (i.e., one using∧and∨) over expressions of the formT(s, t)orC(t), whereT is a role,C is concept, andsandtare elements ofx∪N_I. AP2RPQis a conjunctive two-way regular path query(C2RPQ) if it does not use disjunction. It is a union of conjunctive two-way regular path queries(UC2RPQ) if it is a disjunction of C2RPQs.³Avariable assignmentZforIis a mappingV→∆Î. Forx∈V, we set xÎ,Z := Z(x); forc ∈N_I, we setcÎ,Z :=cÎ.T(s, t)evaluates to true underZand Iif(sÎ,Z, tÎ,Z)∈TÎ.C(t)evaluates to true underZandIiftÎ,Z∈CÎ. AP2RPQ q=∃x.ϕissatisfiedbyI(written:I |=q) if there is a variable assignmentZ(called match) such thatϕevaluates to true underIandZ. AP2RPQqisentailedby a KBK (written:K |=q) if every model ofKsatisfiesq. Ifquses only signature elements ofK, we call it a queryoverK.

Simplifications. AP2RPQissimplifiedif all its atoms are of the formT(x, y)where xandyare variables andT is built from role names using∪,◦, and^∗. For any KBK andP2RPQq, we can construct in polynomial time a KBK⁰ and simplifiedP2RPQq⁰ such thatK |=qiffK⁰|=q⁰. Hence from here on, we will focus on simplified queries.

Automata. For each atomT(x, y)of a simplifiedP2RPQ,T is a regular expression overN_R. Every such expression can be represented by a nondeterministic finite au- tomatonA= (Σ,Q,I,F,T)with an alphabetΣ⊆ {P, P⁻|P∈N_R} of possibly inverted role names, a finite set Qof states, initial statesI ⊆ Q, final statesF ⊆ Q, and transitionsT ⊆ Q×Σ×Q.⁴ The size ofAis polynomially bounded byT. In the following, we assume queries in automaton form, where the atomsT(x, y)have been replaced by the correspondingA(x, y). GivenA = (Σ,Q,I,F,T), let A⁻ = (Σ⁻,Q,F,I,T⁻)withΣ⁻={P⁻|P ∈Σ} be the corresponding reverse automaton⁵with initial and final states swapped and the state transitions flipped:(q⁰, R⁻,q)∈

3As∃x.W

iϕi≡W

i∃x.ϕi, the order of disjunction and existential quantification is irrelevant.

4The correspondence is such that(δ, δ⁰) ∈ T^I iff there is a role pathδ−^P−→ · · ·¹ −^P−→^k δ⁰ inI such thatAacceptsP1. . . Pk.

5For convenience, we consider(P⁻)⁻identical toP.

(4)

T⁻for every(q, R,q⁰)∈T. Moreover, we obtain the “state-parameterized” automaton A_q,q⁰ = (Σ,Q,{q},{q⁰},T)fromAby setting the initial state toqand the final toq⁰.

3 A Little Bit of Model Theory

The “well-behavedness” of certain sublogics of ZOIQcan be conveniently charac- terised in model-theoretic terms.

Definition 1 (quasi-forest model, [5]).LetKbe a KB. A modelIofKis aquasi-forest modelif:

– the domain∆^IofI is a forest, i.e., a prefix-closed subset ofRoots×N^∗for some finite setRoots,

– Roots={o^I|ois an individual name inK}, and

– for everyδ, δ⁰∈∆^Iwith(δ, δ⁰)∈P^Ifor some role nameP ∈NR\ {>}, either (i) {δ, δ⁰} ∩Roots6=∅, or (ii)δ=δ⁰,or (iii)δis a child ofδ⁰, or (iv)δ⁰is a child ofδ.

A KBKhas thequasi-forest model property(short: qfmp) ifKis either unsatisfiable or it has a quasi-forest model. A DLLhas the qfmp if everyLKBKhas the qfmp.

For such well-behaved KBs, tight and quite decent complexity bounds for satisfiability checking have been established.

Theorem 1 ([6]).Satisfiability checking ofZOIQKBs having the qfmp isEXPTIME- complete even with binary encoding of number restrictions.

When it comes to query answering, a useful notion for relating interpretations to each other is via homomorphisms. For two interpretationsI= (∆Î,·Î)andJ = (∆^J,·^J), a functionh:∆Î →∆^J is ahomomorphismfromItoJ if for allδ, δ⁰ ∈∆Î holds δ∈ AÎ ⇒h(δ)∈A^J for all concept namesA, and(δ, δ⁰)∈PÎ ⇒(h(δ), h(δ⁰))∈ P^J for all role namesP. If a homomorphism fromItoJ exists, we writeIBJ. We note thatI |=qandIBJ implyJ |=qfor every simplifiedP2RPQq.

Definition 2 (qfhcp, [3]).A KBKhas thequasi-forest homomorphism-cover property (short: qfhcp) if for every modelIofK, there exists a quasi-forest modelJ ofKwith J BI. A description logicLhas the qfhcp if everyLKBKhas the qfhcp. We lettame ZOIQdenote the set of allZOIQKBs having the qfhcp.

Obviously, the qfhcp implies the qfmp. Note that unrestrictedZOIQdoes not even have the qfmp. The most expressive syntactically defined sublogics of tameZOIQ known today areZIQ,ZOQ, andZOI[6,2]. From the fact that the set of models for anyP2RPQis closed under homomorphisms now follows that for any tameZOIQKB withK 6|=q, there must exist a quasi-forest modelI, satisfyingI |=KandI 6|=q.

4 Annotating (Partial) Query Matches

In the following, we consider a tameZOIQKBK and a simplified P2RPQq over K in automaton form. We note thatq can be rewritten into an equivalent disjunction

(5)

W

1≤i≤nq_iofn(also simplified)C2RPQs, wherenmay be exponential in the size ofq, but every singleq_i’s size is polynomial. Let[q]denote{q₁, . . . q_n}. Then, testingK 6|=q amounts to determining ifKhas a modelIwhich is a simultaneous countermodel (i.e., I 6|=qi) for allqi∈[q].

We now give an intuitive description of our technique to detect (or refute) matches ofq. Given some modelIofK, we iteratively, deterministically annotate all its domain elementsδwith fresh concept namesQMthat indicate which “parts”M of some query qimatch intoIand howδparticipates in these partial matches. To this end, we employ descriptionsM of query parts which contain information about (i) the query variables matched (ii) (the existence of) paths realising certain state transitions in the query’s automata and (iii) optionally, the “position” of the currentδin the query match by use of a marker•acting like an additional, distinguished query variable. In the annotation process, theQMs will be assigned to domain elementsδbased on their local environ- ment and known annotationsQ_M⁰ for “smaller” partial queries to the same element δor its direct (role) neighborhood. As an exception, non-localised query matchesM not containing•will be “broadcast”, i.e., uniformly attached to all domain elements.

This way, in the annotation process,Q_Ms for larger and larger partial queries are assigned, until finally (after reaching the unique fixpoint) also the full matches for anyq_i are recognised by annotationsQq_i. This process will accurately detect partial and full query matches, ifIis a quasi-forest model (which is sufficient for our purposes asK has the qfhcp). The annotation process is realised by virtue of a KBKq, the size of which is exponential inq, but only polynomial inK. In the following, we will stepwise introduce the KB, interleaved with some necessary definitions.

Definition 3 (partial query).Consider aC2RPQqi={A1(x1, y1),...,Am(xm, ym)}.⁶ Apartial queryforqiand a setX ⊆ {x1, y1, . . . , xm, ym}of variables fromqiis a set M consisting of

– all atomsA(x, y)fromq_ifor which{x, y} ⊆X,

– for everyA(x, y)fromqiwithx∈ X buty 6∈ X, one of the atomsAq,q⁰(x,•)or Aq,q⁰(x, o)whereqis initial inA,

– for everyA(x, y)fromqi withx 6∈ X buty ∈ X, one of the atomsA⁻_q,q0(y,•)or A⁻_q,q0(y, o)whereqis final inA,

– in case X = ∅, exactly one atom of the form Aq,q⁰(•, o) or A⁻_q,q0(•, o) (called nominal-anchored path) orAq,q⁰(•,•) orA⁻_q,q0(•,•) (called round trip) for some Afromq_i.

A partial query is called monodic, if |X| = 1. A partial query is called localif it contains•andglobalotherwise.

Nominal-Anchored Paths. First, we want to detect role paths that start in the to-be- annotated individual, end in named individualo^I, and realise state transitions in one of the query’s automata. Letobe any individual name inK, letA^±be eitherAorA⁻for any automatonAoccurring inqwith statesq,q⁰,q⁰⁰. Then letKqcontain the axiom

Q_{A±

q,q(•,o)}(o) (1)

6In cases, we may equate aC2RPQqiwith the set of its atoms.

(6)

and wheneverA^±has a transition(q, R,q⁰):

∃R.Q_{A^±

q0,q00(•,o)}vQ_{A±

q,q00(•,o)} (2)

Round Trips. Next, we are concerned with paths which start and end in the to-be- annotated individual. Assuming an automatonAwith statesq,q⁰,q⁰⁰,q⁰⁰⁰and transition setTas well as an individual nameo, we add the axioms:

> vQ_{A_q,q_(•,•)} (3) Q_{A

q,q0(•,•)}uQ_{A

q0,q00(•,•)}vQ_{A

q,q00(•,•)} (4)

Q_{A

q,q0(•,o)}uQ_{A⁻

q00,q0(•,o)}vQ_{A

q,q00(•,•)} (5)

for(q, R,q⁰)∈T: ∃R.SelfvQ_{A

q,q0(•,•)} (6)

for{(q, R,q⁰),(q⁰⁰, R⁰,q⁰⁰⁰)} ⊆T:

∃(R∩R⁰⁻).Q_{A_q₀_,q₀₀_(•,•)}vQ_{A_q,q₀₀₀_(•,•)} (7) Initialising monodic partial queries. Next we determine domain elements to which separate query variables could possibly be mapped in a match.

Definition 4. We call a monodic partial queryM original, wheneverq = q⁰ for ev- eryA_q,q⁰(x,•) ∈M and everyA⁻_q,q0(x,•)∈M. ForM an original monodic partial query forqand{x}, the setPrec_M of precondition conceptsconsists of:

– Q_{A

q,q0(•,o)}for each atomA_q,q⁰(x, o)inM, – Q_{A−

q,q0(•,o)}for each atomA⁻_q,q0(x, o)inM, – F

qinitial q0final

Q_{A_q,q₀_(•,•)}for each atomA(x, x)inM.

Now we can initialise all original monodic partial queriesM by adding the following axioms toKq:

lPrec_M vQ_M (8) Updating partial queries. Partial queries can be updated by roundtrips: Let M be a partial query containing someA^±_q,q0(x,•). Then we realise the update by adding the following axioms toKq:

QMuQ_{A^±

q0,q00(•,•)}vQ_M_\{A^±

q,q0(x,•)}∪{A^±_q,q₀₀(x,•)} (9) Furthermore, a partial query can “progress” when making a “step” in the model, moving from the consideredδto some role neighbour. In such a step, all the “unready” paths (corresponding to atoms with•) must be updated in a synchronous manner. Given a partial queryM forqi and nonemptyX as well as a set {R1, . . . , Rk} of (possibly inverted) role names, assumeM⁰ can be obtained fromM by replacing each atom of the formA^±_q,q0(x,•)by an atomA^±_q,q00(x,•)such thatA^± has a transition(q⁰, R,q⁰⁰) for for anyR∈ {R1, . . . , Rk}. Then add the following axiom to the KBKq:

∃(R⁻₁ ∩R⁻₂ ∩. . .∩R⁻_m).QM vQM⁰ (10)

(7)

Joining partial queries. When two partial queries corresponding to the same query q_i“meet” in a domain elementδ, they can – under certain circumstances – be “glued together” into a “bigger” partial query ofq_i.

Definition 5 (joinable, join).LetM₁be a partial query forq_i andX₁and letM₂be a partial query forq_iandX₂. We callM₁andM₂joinableif

– X16=∅,X26=∅,X1∩X2=∅, and

– for eachA(x, y)∈q_i\(M₁∪M₂)with{x, y} ⊆X₁∪X₂, there are statesq,q⁰,q⁰⁰of Awithqinitial andq⁰⁰final, such that either{A_q,q⁰(x,•),A⁻_q00,q⁰(y,•)} ∈M₁∪M₂ or{Aq,q⁰(x, o),A⁻_q00,q⁰(y, o)} ∈M1∪M2for someo∈NI.

For joinableM1andM2, theirjoin, denotedM1 ./ M2, is the partial query obtained fromM1∪M2by replacing any pair of atomsAq,q⁰(x,∗)andA⁻_q00,q⁰(y,∗)byA(x, y), where∗is either an individual nameoor•.

We implement the join operation for every pairM₁,M₂of joinable partial queries for aqi∈[q]by extendingKq with:

QM₁uQM₂ vQM₁./M₂ (11) Broadcasting of global partial queries. Whenever a partial queryM does not have any occurrences of•, which would tie it to a specific element, it will be “broadcast” to all domain elements:

∃>.QM vQM (12)

This concludes the definition ofKq. Revisiting our initial intuition ofKq’s purpose of deterministically annotating a modelI of K, the following lemma formalises this by singling out one unique annotated modelI^∗for everyI.

Lemma 1. LetKbe aZOIQKB andqa simplifiedP2RPQ. For every modelI ofK there exists a unique modelI^∗ofK ∪ Kq which extendsIand satisfiesQ^I_M^∗⊆Q^J_Mfor everyQM inKqand for every modelJ ofK ∪ Kqthat extendsI.

We will call modelsI^∗from Lemma 1Q-minimal. Note that for every modelI ofK, any concept membershipδ∈Q^I_M^∗holding in the correspondingQ-minimal modelI^∗ can be derived from concept and role memberships inI through a finite sequence of

“applications” of axioms fromK_q.⁷

5 Reduction to Satisfiability

Now we establish technical results relating the presence ofQ_M-annotations in models and the actual semantic satisfaction of the corresponding partial queryM. Note that any partial queryM can be seen as a (set representation of a)C2RPQin automaton form, assuming•is just an ordinary variable.

7To see this, it may help to realise that all axioms ofKq can be easily expressed in monadic Datalog.

(8)

Definition 6 ((tight) satisfaction of a partial query).Given an interpretation I, a domain elementδ ∈ ∆^I, and a partial queryM, we sayM is satisfiedor holdsin δ, writtenI, δ|=M, if there is a matchZ forM inI withZ(•) =δ.M issatisfied or holds tightlyinδ, writtenI, δ|=|=M, ifZ is such that everyA^±_q₁_,q₂(x,•) ∈ M is realized by a pathZ(x);δnot containingo^Ifor any individual nameo.

Note that|=and|=|=coincide wheneverM is global or does not contain variables. Note also that, as a consequence of this definition, ifM is global, thenQ_M holds (tightly) everywhere or nowhere throughout the domain. With this notion in place, the next two lemmas can be seen as soundness and completeness results regarding the deduction calculus for (partial) query matches embodied byKq.

Lemma 2 (soundness). LetIbe aQ-minimal model ofK ∪ K_qand letδ∈∆Î. Let M be any partial query for anyC2RPQq_i∈[q]. Thenδ∈QÎ_M impliesI, δ|=M. Lemma 3 (completeness).LetI be a quasi-forest model ofK ∪ Kq and letδ∈ ∆Î. LetM be any partial query for anyC2RPQq_i ∈[q]. ThenI, δ|=|=M impliesδ∈QÎ_M. Soundness is proven by induction over the length of the derivation forδ∈QÎ_M. Com- pleteness is shown by induction over thespreadof the match forM, i.e., the number of variables (except•) plus the sum of the lengths of all paths realising the query atoms.

Thanks to these correspondences, we can essentially rule out models with matches of anyq1, . . . , qnby forcing the extensions ofQq₁, . . . , Qq_nto be empty. Therefore, let K_¬q =K ∪ Kq∪ {Qq_i v ⊥ |qi∈[q]}. (13) We establish some syntactic and semantic properties ofK¬q: Some easy calculations yield size bounds forK¬q, tameness is not affected by the reduction, and unsatisfiability ofK¬qindeed coincides with query entailment.

Fact 4 (size ofK¬q) The size ofK¬q is single exponential in the total size ofKandq.

It is polynomial, if the number of atoms inqis bounded by a constant.

Lemma 5. IfKis in tameZOIQthen so isK_¬q.

Theorem 2. LetKbe a tameZOIQKB and letqbe a simplifiedP2RPQoverK. Then K |=qif and only ifK¬q is unsatisfiable.

Proof. We show the equivalent statement:K 6|=qif and only ifK_¬qsatisfiable.

“only if”: AssumeK 6|=q, that is there exists anIwithI |=KandI 6|=q. Consider I^∗, theQ-minimal model ofK ∪ Kq extendingI. If someQ^I_q_i^∗were nonempty, then there would be a query match ofqintoI by Lemma 2, contradicting our assumption.

Therefore,I^∗|=Q_q_iv ⊥for allq_i∈[q]and hence,I^∗|=K¬q.

“if”: AssumeK¬qis satisfiable and hence has a modelI. AsK¬qhas the qfmp thanks to Lemma 5, we can assumeIto be a quasi-forest model.I |=Q_q_iv ⊥implies empti- ness ofQ^I_q

i for every q_i ∈ [q]. If I |= q held true, witnessed byI |= q_i for some qi ∈ [q], Lemma 3 would requireδ ∈Q^I_q_i for allδ∈ ∆^I, leading to a contradiction.

HenceI 6|=q. On the other hand,I |=K_¬qimpliesI |=K, thereforeK 6|=q.

(9)

Based on this theorem, we can now prove our central result:

Theorem 3. Checking entailment of simplifiedP2RPQs from tameZOIQKBs with binary number encoding is2EXPTIME-complete. It isEXPTIME-complete if the number of atoms in the query is bounded.

Proof. By Theorem 2, entailment of a simplifiedC2RPQqfrom a tameZOIQKBK can be reduced to checking unsatisfiability ofK¬q, which can be computed in output- polynomial time and the size of which is exponential (polynomial, if number of atoms in qis bounded). By Theorem 1, (un)satisfiability ofK_¬q can be checked in EXPTIMEin the size ofK_¬q, hence we obtain the desired 2EXPTIMEand EXPTIMEupper bounds.

The matching lower bounds are well-known even for much weaker logics [9].

6 Querying GC

²

Databases

We now consider the problem ofP2RPQentailment fromGC² databases, i.e., sets of unary and binary ground facts in the presence of a sentence in the guarded two-variable fragment with counting quantifiers as defined by Pratt-Hartmann ([10]). We will show that there is a polytime query-entailment-preserving transformation fromGC²databases toZIQ KBs. This allows us to transfer the results from Theorem 3. Matching lower bounds follow from known results for much weaker logics and query formalisms (such asCQs overALCI[7]).

Definition 7 (GC², [10]).For a signatureSof nullary (N0), unary (NC), and binary (NR∪ {≈}) predicates,⁸letGC²be the smallest set of formulae that contains all atoms over Susing only variables from{x, y}, that is closed under Boolean combinations, and that contains

– all formulae∃u.ϕand∀u.ϕwithu∈ {x, y}wheneverϕis aGC²formula with at most one free variable, and

– all formulae∀u.(γ → ϕ)as well as Q

u.(γ∧ϕ)and Q

u.(γ)whereγ is a binary atom (called “guard”) containing bothxandy,ϕis aGC²formula, and Q

is any of

∃,∃₆n,∃=n, or∃_>n.

AGC²database DBis a theory{Φ} ∪ Aconsisting of aGC²sentenceΦand a finite set Aof unary and binary ground facts over constants fromNI.⁹ We denote byGC²_qef the set ofGC²formulae which are quantifier-free and do not use≈.

AP2RPQover aGC²database is aP2RPQwhere for every atom of the formC(t)holds C∈N_Cand for every atom of the formT(s, t), every subexpression of the formid(C) satisfiesC ∈N_C.P2RPQentailment fromGC²databases is defined in the same way as for DLs.

For convenience, we make use of a special normal form forGC²introduced in the following.

8As unary and binary predicates correspond to concept and role names, respectively, we use the same set symbols to denote them.

9Note that ABoxes, as defined before, are syntactically and semantically identical to finite sets of unary and binary ground facts, so we consider the notions the same and use the same symbol A. Note also that by definition,Φdoes not contain constants.

(10)

Lemma 6 (normal form, [11]).For anyGC²sentenceΦ, one can compute in polynomial time aGC²sentenceNF(Φ)of the form

∀xα∧V

1≤h≤`∀x∀y(Eh(x, y)→βh∨x≈y))

∧V

1≤i≤m∀x∃=niy(F_i(x, y)∧x6≈y), (14) withα∈ GC²_qef not usingy,{E1, ... , E_`, F₁, ... , F_m} ⊆N_R,{β1, ... , β_`} ⊆ GC²_qef, and{n1, ... , n_m} ⊆N, such that (i)NF(Φ)|=Φand (ii) every model ofΦwith at least max_hn_helements can be extended to a model ofNF(Φ).

Given aGC²database DB={Φ} ∪ A, we letNF(DB) ={NF(Φ)} ∪ A. Thanks to the previous lemma and the known property that models ofGC²sentences are closed under disjoint self-union, the normal form can be shown to be query-entailment-preserving.

Lemma 7. For everyP2RPQqover aGC²databaseDBholdsDB|=qif and only if NF(DB)|=q.

We next provide a query-entailment-preserving polynomial translation ofGC²databases intoZIQKBs.

Definition 8. LetΨ be aGC² sentence in normal form as in Lemma 6. We define the ZIQTBoxT_Ψ as follows:

T_Ψ =T_≈∪ T_ps∪ T_α∪ [

1≤h≤l

T_E_h_,β_h∪ [

1≤i≤m

T_F_i_,n_i (15)

Thereby (introducing fresh concept namesP rop_P and role namesR_≈andR_E_h_,at):

– T≈={> v61R≈.>, > v ∃R≈.Self}

– Tps={P ropP ≡ ∃>.P rop_P |P ∈N₀} – Tα={> vtr(α)}, where

tr(P) =P ropP

tr(P(x)) =P tr(P(x, x)) =∃P.Self

tr(¬ϕ) =¬tr(ϕ) tr(ϕ∧ψ) =tr(ϕ)utr(ψ) tr(ϕ∨ψ) =tr(ϕ)ttr(ψ) – TE_h,β_h=S

at∈atoms(βh)(axE_h(at)∪ {> v ∀(RE_h,at\Eh).⊥})

∪ {> v ∀(Eh\(R_≈∪trE_h(βh))).⊥}

whereaxE_hmaps atoms inβhto sets of GCIs as follows:

P 7→ { ∃RE_h,P.> vP ropP, ∃(Eh\RE_h,P).> v ¬P ropP} P(x) 7→ { ∃R_E_h_,P_(x).> vP, ∃(Eh\R_E_h_,P(x)).> v ¬P } P(x, x)7→ { ∃R_E_h_,P_(x,x).> v ∃P.Self, ∃(Eh\R_E_h_,P_(x,x)).> v ¬∃P.Self}

P(y) 7→ { ∃R_E⁻

h,P(y).> vP, ∃(E_h⁻\R⁻_E

h,P(y)).> v ¬P } P(y, y)7→ { ∃R_E⁻

h,P(y,y).> v ∃P.Self, ∃(E_h⁻\R⁻_E

h,P(y,y)).> v ¬∃P.Self}

P(x, y)7→ { ∃(R_E_h_,P_(x,y)\P).> v ⊥, ∃(Eh∩P\R_E_h_,P(x,y)).> v ⊥ } P(y, x)7→ { ∃(R⁻_E

h,P(y,x)\P).> v ⊥, ∃(E_h⁻∩P\R⁻_E

h,P(y,x)).> v ⊥ } andtrE_h mapsGC²_qef formulae to simple roles:

tr_E_h(at) =R_E_h_,at trE_h(¬ρ) =Eh\tr(ρ)

tr_E_h(ρ∧ρ⁰) =tr_E_h(ρ)∩tr_E_h(ρ⁰) trE_h(ρ∨ρ⁰) =trE_h(ρ)∪trE_h(ρ⁰)

(11)

– T_F_i_,n_i ={ > v 6n_i(F_i\R_≈).> u >n_i(F_i\R_≈).> }

Given aGC²databaseDB={Φ} ∪ A, we letKDBdenote theZIQKB(TNF(Φ),A).

Intuitively,T_≈axiomatises equality,Tpsintroduces synchronised concept namesP ropP

for every propositional symbolPfrom DB, sinceZIQdoes not allow for nullary predicates.Tα,TE_h,β_h, andTF_i,n_iimplement the conjuncts fromNF(Φ).

We can now observe thatKDBcan be computed in polytime wrt. the size of DB and that it has the qfhcp (since it is aZIQKB). Moreover we can show that any model of KDBcan be extended to one ofNF(DB)and, conversely, any model ofNF(DB)can be extended to one ofK_DB. With the help of these correspondences and Lemma 7, we can then argue that for anyP2RPQqover DB holds DB|=qiffK_DB|=q. These findings allow us to apply Theorem 3 to establish the following result.

Theorem 4. CheckingP2RPQentailment fromGC²databases is2EXPTIME-complete.

It isEXPTIME-complete if the number of query atoms is bounded by a constant.

7 Conclusion

We have established tight complexity bounds for expressive querying in very expressive DLs under the assumption of succinct (i.e. binary) encoding of number restrictions. Ar- guing along the lines of Calvaneseet al.[6], we can leverage our findings to strengthen their results on query containment as well as theSRfamily as follows:¹⁰

Theorem 5 (query containment).Testing query entailmentK |=q⁰ ⊆qis in2EXP- TIME with respect to the total size ofq⁰,q, andK (with binary encoding of number restrictions) if (i)Kis inZOQorZOI andq⁰andqareP2RPQs overK, or (ii)Kis inZIQ,q⁰is a conjunctive query andqis aP2RPQoverK. The complexity drops to EXPTIMEif the number of atoms occurring inqis bounded by a constant.

Theorem 6 (SRfamily).DecidingK |= qandK |=q⁰ ⊆qis in3EXPTIMEin the total size ofq⁰,q,K(with binary encoding of numbers) – and in2EXPTIMEifK’s RBox is given by defining regular expressions for the non-simple roles – if (i)Kis inSROQ or SROI and q⁰,qare P2RPQs overK, or (ii)Kis inSRIQ,q⁰ is a conjunctive query andqis aP2RPQoverK. The complexities are in2EXPTIMEand inEXPTIME, respectively, if the number of atoms occurring inqis bounded by a constant.

There are plenty of open questions left for future work. First, the results established here hold for arbitrary models, however, so far, very little is known about finite (model) query answering in DLs from theZ family and the methods applied here do not seem to easily carry over to the finite model setting. Second, while the extension of our query formalism by nestings in the sense of Bienvenuet al.[1] seems straightforward without impacting complexities, our technique seems not readily extendable to capture more elaborate query languages [14,4,12].

10For space reasons, we assume the reader to be familiar with the notions used in these theorems.

(12)

References

1. Meghyn Bienvenu, Diego Calvanese, Magdalena Ortiz, and Mantas Simkus. Nested regular path queries in description logics. In Chitta Baral, Giuseppe De Giacomo, and Thomas Eiter, editors, Proc. 14th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR’14), pages 218–227. AAAI Press, 2014.

2. Piero A. Bonatti, Carsten Lutz, Aniello Murano, and Moshe Y. Vardi. The complexity of enriched mu-calculi.Logical Methods in Computer Science, 4(3:11):1–27, 2008.

3. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. How to best nest regular path queries. InProc. 27th Int. Workshop on Description Logics (DL’14), volume 1193 ofCEUR Workshop Proceedings, pages 404–415. CEUR-WS.org, 2014.

4. Pierre Bourhis, Markus Krötzsch, and Sebastian Rudolph. Reasonable highly expressive query languages. In Qiang Yang and Michael Wooldridge, editors, Proc. 24th Int. Joint Conf. on Artificial Intelligence (IJCAI’15), pages 2826–2832. AAAI Press, 2015.

5. Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Answering regular path queries in expressive description logics: An automata-theoretic approach. InProc. 22nd AAAI Conf. on Artificial Intelligence (AAAI’07), pages 391–396. AAAI Press, 2007.

6. Diego Calvanese, Thomas Eiter, and Magdalena Ortiz. Regular path queries in expressive description logics with nominals. In Craig Boutilier, editor,Proc. 21st Int. Joint Conf. on Artificial Intelligence (IJCAI’09), pages 714–720. IJCAI, 2009.

7. Carsten Lutz. The complexity of conjunctive query answering in expressive description logics. In Alessandro Armando, Peter Baumgartner, and Gilles Dowek, editors,Proc. 4th Int. Joint Conf. on Automated Reasoning (IJCAR’08), number 5195 in LNAI, pages 179–

193. Springer, 2008.

8. Magdalena Ortiz, Sebastian Rudolph, and Mantas Simkus. Query answering is undecidable in DLs with regular expressions, inverses, nominals, and counting. INFSYS Research Report 1843-10-03, TU Vienna, 2010.

9. Magdalena Ortiz and Mantas Simkus. Reasoning and query answering in description logics.

In Thomas Eiter and Thomas Krennwallner, editors,Proc. 8th Int. Reasoning Web Summer School (RW’12), volume 7487 ofLecture Notes in Computer Science, pages 1–53. Springer, 2012.

10. Ian Pratt-Hartmann. Complexity of the guarded two-variable fragment with counting quantifiers.J. Log. Comput., 17(1):133–155, 2007.

11. Ian Pratt-Hartmann. Data-complexity of the two-variable fragment with counting quantifiers.

Inf. Comput., 207(8):867–888, 2009.

12. Juan L. Reutter, Miguel Romero, and Moshe Y. Vardi. Regular queries on graph databases.

Theor. Comp. Sys., 61(1):31–83, July 2017.

13. Sebastian Rudolph. Undecidability Results for Database-Inspired Reasoning Problems in Very Expressive Description Logics. In Chitta Baral, James P. Delgrande, and Frank Wolter, editors,Proc. 15th Int. Conf. on Knowledge Representation and Reasoning (KR’16), pages 247–257. AAAI Press, 2016.

14. Sebastian Rudolph and Markus Krötzsch. Flag & check: Data access with monadically defined queries. In Richard Hull and Wenfei Fan, editors,Proc. 32nd Symposium on Principles of Database Systems (PODS’13), pages 151–162. ACM, 2013.