• Aucun résultat trouvé

Answering Counting Queries over DL-Lite Ontologies

N/A
N/A
Protected

Academic year: 2021

Partager "Answering Counting Queries over DL-Lite Ontologies"

Copied!
30
0
0

Texte intégral

(1)

HAL Id: hal-02927913

https://hal.inria.fr/hal-02927913

Submitted on 2 Sep 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Answering Counting Queries over DL-Lite Ontologies

Meghyn Bienvenu, Quentin Manière, Michaël Thomazo

To cite this version:

Meghyn Bienvenu, Quentin Manière, Michaël Thomazo. Answering Counting Queries over DL-Lite

Ontologies. IJCAI 2020 - Twenty-Ninth International Joint Conference on Artificial Intelligence, Jul

2020, Yokohama, Japan. �hal-02927913�

(2)

Answering Counting Queries over DL-Lite Ontologies

Meghyn Bienvenu

1

, Quentin Mani`ere

1

and Micha¨el Thomazo

2

1

University of Bordeaux, CNRS, Bordeaux INP, LaBRI, Talence, France

2

Inria, DI ENS, ENS, CNRS, University PSL, Paris, France

{meghyn.bienvenu, quentin.maniere}@u-bordeaux.fr, michael.thomazo@inria.fr

Abstract

Ontology-mediated query answering (OMQA) is a promising approach to data access and integra- tion that has been actively studied in the knowl- edge representation and database communities for more than a decade. The vast majority of work on OMQA focuses on conjunctive queries, whereas more expressive queries that feature counting or other forms of aggregation remain largely unex- plored. In this paper, we introduce a general form of counting query, relate it to previous proposals, and study the complexity of answering such queries in the presence of DL-Lite ontologies. As it fol- lows from existing work that query answering is in- tractable and often of high complexity, we consider some practically relevant restrictions, for which we establish improved complexity bounds.

1 Introduction

Ontology-mediated query answering (OMQA) utilizes on- tologies to provide a convenient vocabulary for query for- mulation and to capture domain knowledge that is exploited during the querying process to obtain more complete sets of answers [Poggi et al., 2008; Bienvenu and Ortiz, 2015;

Xiaoet al., 2018]. Much of the work on OMQA considers ontologies formulated using description logics (DLs), a fam- ily of knowledge representation languages that provide the logical foundations of the OWL web ontology language. Par- ticular attention has been to the DL-Lite family of DLs [Cal- vaneseet al., 2007], which were developed with OMQA in mind and enjoy favorable computational properties.

The vast majority of work on OMQA supposes that user queries are given as conjunctive queries (CQs). However, there are many other kinds of database queries, beyond plain CQs, that are relevant in practice. This motivates research into the feasibility of adopting other database query lan- guages for OMQA. While enriching CQs with either negated atoms or inequalities has been shown to lead to undecidabil- ity even in very restricted settings [Guti´errez-Basultoet al., 2015], the situation is more positive for navigational queries (like regular path queries), which can be adopted without los- ing decidability, sometimes even retaining tractable data com- plexity [Bienvenuet al., 2015b].

Aggregate queries, which use numeric operators (e.g.

count, sum, max) to summarize selected parts of a dataset, constitute another prominent class of database queries. Al- though such queries are widely used for data analysis, they have been little explored in context of OMQA. This may be partly due to the fact that it is not at all obvious how to define the semantics of such queries in the OMQA setting.

A first exploration of aggregate queries in OMQA was con- ducted by Calvaneseet al.(2008). They argued that the most straightforward adaptation of classical certain answer seman- tics to aggregate queries was unsatisfactory, as often values would differ from model to model, leading to no certain an- swers. For this reason, an epistemic semantics was proposed, in which variables involved in the aggregation are required to match to data constants. However, as discussed in [Kostylev and Reutter, 2015], this semantics can also give unintuitive results by ignoring ways of mapping aggregate variables to anonymous elements inferred due the ontology axioms. For instance, if no children ofalexare listed in the data, then a query that asks to return the number of children will yield 0 under epistemic semantics, even if it can be inferred (e.g.

due to a family tax benefit) that there must be at least 3 chil- dren. This led Kostylev and Reutter to define an alternative semantics for two kinds of counting queries (inspired by the COUNT and COUNT DISTINCT in SQL) which adopts a form of certain answer semantics but considers lower and up- per bounds on the count value across different models. For the two considered logics (DL-Litecore and DL-LiteR), only the lower bounds on the count value are non-trivial, and a complexity analysis shows that they are challenging to iden- tify: coNP-data complexity for both logics, and Πp2-hard (resp.coNEXP-hard) in combined complexity for DL-Litecore (resp. DL-LiteR). Several question were left unanswered by their work, including the exact combined complexity, the dif- ficulty of recognizing the optimal lower bound, and the im- pact of allowing multiple aggregation variables.

This paper returns to the issue of handling counting queries in OMQA and makes several important contributions:

1. We propose a new notion of counting CQ that general- izes the two forms of queries from [Kostylev and Reutter, 2015] and allows arbitrarily many counting variables.

2. We show that existing complexity results for DL-Litecore

and DL-LiteRKBs continue to hold for our more general notion of counting CQ, and provide an improvedcoNEXP

(3)

upper bound for the relevant case of finite-depth TBoxes.

3. We consider the impact of restricting the query structure, focusing on the class of rooted queries, in which every query variable must be connected to an answer variable or individual in the query graph. A recent result, obtained as part of a study of bag semantics for OMQA, identified a case in which rootedness leads to tractable data com- plexity for counting queries [Nikolaouet al., 2019]. This motivates us to perform a more thorough investigation of rooted counting queries, which yields several improve- ments upon existing complexity bounds.

4. We prove that the problem of identifying the best certain interval isDP-complete in data complexity.

Our results close some questions that were left open by the work of Kostylev and Reutter and pave the way for further study of counting and aggregate queries in the OMQA setting.

An appendix with full proofs can be found in the long ver- sion of this paper, available on arXiv.

2 Preliminaries

We recall the basics of description logics (DLs), focusing on DL-Lite, see [Baaderet al., 2017] for more details.

Syntax and Semantics. A description logic vocabulary consists of a setNCofatomic concepts(unary predicates), a setNRofatomic roles(binary predicates), and a setNIofindi- vidual names(constants). Byrole, we mean either an atomic roleP ∈NRor aninverse roleP(whereP ∈NR). We let N±R denote the setNR∪ {P|P ∈NR}of roles and use the notationRto meanPifR=P ∈NRandPifR=P.

A DLknowledge base (KB) is a pairK = (T,A), con- sisting of an ABoxAthat contains facts about particular in- dividuals and a TBox T that expresses general knowledge about the domain. Formally, anABoxis a finite set ofcon- cept assertions A(b), with A ∈ NC andb ∈ NI, androle assertionsP(a, b), withP ∈ NR and a, b ∈ NI. We use Ind(A) to denote the set of individuals in A. A TBoxis a finite set of axioms, whose syntax depends on the particular DL. In DL-Litecore, axioms take the form ofconcept inclu- sionsB1 v(¬)B2, where eachBiis eitherA(forA∈NC) or∃R(withR∈N±R). DL-LiteRTBoxes additionally allow role inclusionsR1v(¬)R2, whereR1, R2∈N±R.

Example 1. Our example KB talks about leading (LeadIn) and supporting actors (SuppIn) in movies:

Aact={ActsIn(doona,cloud),SuppIn(berry,cloud), SuppIn(hanks,cloud),SuppIn(hanks,catch)}

Tact={LeadInvActsIn,SuppInvActsIn,

∃SuppIn v ∃LeadIn}

An interpretation takes the formI = (∆II), where∆I is a non-empty set (thedomainofI), and.Iis a function that maps eachA ∈ NC to a subsetAI ⊆ ∆I, eachP ∈ NR to a binary relationPI ⊆ ∆I ×∆I, and eacha ∈ NI to an elementaI of ∆I. We make theunique names assumption (UNA) by requiring thataI 6= bI for every a, b ∈ NI with a6=b. The function·Inaturally extends to complex concepts and roles: (∃R)I = {d| ∃d0 : (d, d0) ∈ RI}, (P)I =

{(d1, d2)|(d2, d1) ∈ PI},(¬B)I = ∆I\BI,(¬R)I =

I×∆I

\RI. A (concept or role) inclusionF vGis satisfied inIifFI ⊆GI; assertionA(b)is satisfied inI if bI∈AI;P(a, b)is satisfied inIif(aI, bI)∈PI. We callI amodelofK, writtenI |=K, if it satisfies all inclusions and assertions inK. A KB issatisfiableif has at least one model.

Queries. We recall that aconjunctive query(CQ) takes the form∃yψ(x,y), wherexandyare tuples of variables drawn from an infinite set of variablesV, andψis a conjunction of atoms, which can be eitherconcept atomsA(t1)orrole atoms P(t1, t2), whereA ∈ NC,P ∈ NR, andtermsti are drawn fromNI∪x∪y. Consider an interpretationIand CQq=

∃yψ(x,y)with|x| =n. A tupleα ∈ ∆In

is ananswer toqinI, writtenI |=q(α), if there is ahomomorphism ofq intoI, i.e., a functionσthat maps the terms ofqto elements of∆Isuch that (i)σ(a) =aI fora∈NI, (ii)σ(t)∈AI for every atomA(t)ofq, and (iii)(σ(t1), σ(t2))∈PIfor every atomP(t1, t2)ofq. A tuplea∈Ind(A)nis acertain answer toqw.r.t. the KBKiffI |=q(aI)for every modelIofK.

Canonical Model. We recall the definition of the canonical modelCKof a DL-LiteR KBK = (T,A). The domain of CKconsists ofInd(A)and all words of the formaR1. . . Rn, witha∈Ind(A),Ri∈N±R, andn>1, such that:

• K |=∃R1(a)and there is noR1(a, b)∈ A;

• for1≤i < n,T |=∃Ri v ∃Ri+1andRi 6=Ri+1. We interpret individuals as themselves (aCK =a) and atomic concepts and roles as follows:

ACK={a∈Ind(A)| K |=A(a)}

∪ {aR1. . . Rn ∈∆CK\Ind(A)| T |=∃Rn vA}

PCK={(a, b)|P(a, b)∈ A} ∪

{(e1, e2)|e2=e1RandT |=RvP} ∪ {(e2, e1)| e2=e1RandT |=RvP}

The term ‘canonical model’ is motivated by the following well-known property ofCK(see e.g. [Calvaneseet al., 2007]):

Lemma 1. LetKbe a satisfiable DL-LiteRKB. ThenCK |= Kand ifI |=K, there is a homomorphism ofCKintoI. A useful corollary is that thecertain answers to a CQqw.r.t.

Kare the tuples fromInd(A)that areanswers toqinCK. Note that CK may be infinite. Thedepthof a TBoxT is defined as the maximal length of any word that appears in the domain ofCKfor any KBKwhose TBox isT. If this number is finite, we say thatT is afinite-depth TBox; such TBoxes can be identified in polynomial time [Bienvenuet al., 2015a].

3 Counting Queries

We now introduce our formalization of counting queries. In addition to the setV of (classical) variables, we assume a second infinite set of counting variablesVc, disjoint fromV.

Definition 1. A counting conjunctive query (CCQ)qtakes the formq(x) =∃y∃zψ(x,y,z), wherex∪y⊆V,z ⊆Vc, andψis a conjunction of concept and role atoms whose terms are drawn fromNI∪x∪y∪z. We callx(resp.y, resp.z) theanswer(resp.existential, resp.counting) variables ofq.

(4)

We first define the semantics of counting queries on a sin- gle interpretation I, by considering those pairs(a, n)such thatnis the number of possible ways to mapzintoIwhen xis mapped toa. Such pairs are called theanswerstoqinI.

Definition 2. Amatchof a CCQq(x) =∃y∃zψ(x,y,z)in Iis a homomorphism1fromqintoI. If a matchσmapsxto a, then the restriction ofσtozis called acounting match (c- match)ofq(a)inI. The set of answers toqinI, denotedqI, contains all pairs(a, qaI), whereqaIis the number of distinct c-matches ofq(a)inI.

As has been previously noted (see e.g. [Kostylev and Reut- ter, 2015]), the exact count values of the answers in qI are usually too specific to hold across models. Considering boundson the exact value provides more insight, while still allowing unnamed elements to be counted. This motivates the following notion of answer interval.

Definition 3. The set[q]Iofanswer intervals for a CCQqin I contains all pairs(a,[m, M])witha ∈ Ind|x|andm, M integers such that m ≤ qIa ≤ M. The set[q]K of certain (counting) answers toq w.r.t. KBK is obtained by consid- ering those answer intervals that hold in all models of K:

[q]K=T

I|=K[q]I.

Note that (a,[m, M]) ∈ [q]K does not imply that for any n∈[m, M]there exists a modelIin which(a, n)∈qI.

Definition 1 is a proper generalization of the two forms of counting query considered by Kostylev and Reutter.

Reusing their notations, a Cntd()-query q(x, Cntd(z)) =

∃y ψ(x,y, z)corresponds to the CCQq(x) = ψ(x,y, z), while a Count()-query q(x, Count()) = ∃yψ(x,y) cor- responds to the CCQq(x) = ψ(x,∅,y)ˆ (withyˆ a tuple of variables fromVcin bijection withy). We will use the term exhaustiveto refer to the latter CCQs, i.e. those in which ev- ery non-answer variable is a counting variable.

Example 2. Reconsider the KBKact = (Tact,Aact). We can use CCQs to count the pairs of actors (leading role, support- ing role) having acted together (q1), return movies together with a count of their supporting actors (q2), and count the number of actors having acted with Tom Hanks (q3):

q1=∃y∃z1∃z2LeadIn(z1, y)∧SuppIn(z2, y) q2(x) =∃zSuppIn(z, x)

q3=∃y∃zActsIn(hanks, y)∧ActsIn(z, y) According to our semantics, we have:

• (∅,[2,+∞]) ∈[q1]Kact, sincez2can be mapped to either berryor hanks, andz1 mapped to the lead actor (which must exist due toTact). As the lead actors of the two films could be the same,(∅,[3,+∞])6∈[q1]Kact.

• (cloud,[2,+∞])∈[q2]Kact, mappingztoberryandhanks.

• (∅,5) ∈ qC3Kact, since inCKact, we can mapz to a named actor or the two elements standing in for the lead actors.

• (∅,[5,+∞])∈/ [q3]Kact, since the lead actors could possi- bly be the same or one of the named actors.

The latter two points show that the canonical model does not yield the minimal number of matches.

1The notion of homomorphism of a CCQ is defined in the same way as for CQs, simply treating variables fromVclike those inV.

Data Combined

DL-Litecore coNP-c Πp2-h,PP-h& incoNEXP DL-LiteR coNP-c coNEXP-h& incoN2EXP

coNEXP-c (T of finite depth) Table 1: Data and combined complexity of CCQ answering

4 General Counting CQs

We shall consider the following CCQ answering decision problem: given a KBK = (T,A), CCQ q, and candidate answer(a,[m, M]), decide whether(a,[m, M])∈[q]K.

As ontology languages, we will consider DL-LiteR(which underlies OWL 2 QL) and its sublogic DL-Litecore. We know from [Kostylev and Reutter, 2015] that in these DLs, the least upper boundMcan take one of three values (0, 1, or +∞) and is easily computed. The argument2transfers to our more gen- eral notion of CCQ. We can thereforerestrict our attention to identifying certain answers of the form(a,[m,+∞]).

We will consider the two usual complexity measures:com- bined complexitywhich is in terms of the size of the whole input (T,A,q,a,m), anddata complexitywhich is only in terms of the size ofAandm(T andqare treated as fixed).

We will assume thatmis given in binary.

4.1 General Case

Table 1 displays complexity results for answering general CCQs over DL-Litecore and DL-LiteR TBoxes (we use ‘-c’

and ‘-h’ as abbreviations for ‘-complete’ and ‘-hard’).

With the exception of thePP-hardness result (discussed in Section 6.1), the lower bounds are inherited from [Kostylev and Reutter, 2015]. We will thus concentrate on the upper bounds from Table 1, which are obtained by generalizing and clarifying the constructions of Kostylev and Reutter. We give an overview of the proof both to give the flavor of the tech- niques involved and to enable us to discuss the necessary adaptations used to prove later results.

The proof constructs a decision procedure for the comple- mentary problem of deciding whether(a,[m,+∞]) 6∈[q]K. The latter holds iff there exists acountermodel, i.e., a model ofKwith fewer thanmc-matches ofq(a). The main ingre- dient of the proof is the following theorem, which shows that it is sufficient to consider countermodels of bounded size.

Theorem 1. For every DL-LiteR(resp. DL-Litecore) KBK= (T,A) and CCQ q, if there is a model of K with fewer than m c-matches of q(a), then there exists one of size3 O(|A||T ||q|+1)(resp.O(|A||q|)).

With Theorem 1 in hand, we can easily define non- deterministic procedures that witness the complexity upper bounds from Table 1: simply guess an interpretation of poly- nomial / exponential / double-exponential size (depending on the case) and verify whether it is a countermodel.

The proof of Theorem 1 starts with an arbitrary counter- modelI and modifies it in order to make it smaller, being

2Briefly, the upper bound is 0 if the tuple is not a certain answer;

otherwise, it is either 1 ifz=∅, else+∞.

3As usual,|T |(resp.|A|,|q|) denotes the size ofT (resp.A, q).

(5)

careful not to introduce any new c-matches ofq(a). We first identify a relevant subset∆of the domain ofI, consisting of the interpretations of all individual names fromAand the images of all c-matches ofq(a). We then define a new in- terpretation that intuitively preserves∆and replaces the rest ofI with parts of the canonical model, to introduce a more regular structure. Formally, we fix a homomorphismfofCK into I (see Lemma 1) and consider the following mapping f0: ∆CK→∆∪∆CKfrom [Kostylev and Reutter, 2015]:

f0(d) =

f(d) iff(d)∈∆ d otherwise

We define theinterleaving4I0ofIas the image ofCKbyf0, i.e., with domainf0(∆CK)and interpretation functionf0◦·CK. It is not difficult to prove that the interleavingI0is a model ofK. Moreover, by exhibiting a homomorphismρfromI0to I, we can translate matches ofI0 into matches inI. As the images of c-matches ofq(a)are contained in ∆, which is left unchanged inI0, the homomorphismρis in fact a one-to- one mapping of c-matches ofq(a)inI0to those inI. This shows thatI0is also a countermodel.

The interleavingI0 may be arbitrarily large, even infinite.

To reduce its size, an equivalence relation is introduced, and elements from∆I0\∆that belong to the same equivalence class are merged (elements from∆are retained). In the case of DL-LiteR, there can be double-exponentially many equiv- alence classes, as elements are grouped based upon the prop- erties of their|q|-neighborhoods, while for DL-Litecore, we can use a more refined relation with only exponentially many classes. This means that the resulting models are either of single- or double-exponential size w.r.t. combined complex- ity, depending on the chosen DL.

A crucial final step is to show that the merging of elements does not introduce any new c-matches ofq(a), so the result- ing model is still a countermodel. This part of the argument, only sketched in [Kostylev and Reutter, 2015], requires a de- tailed and technical analysis of the construction to ensure that this property holds for our more general class of CCQs. We show that this is indeed the case, which answers a question left open by Kostylev and Reutter about counting CQs with both existential variables and multiple counting variables.

4.2 Case of Finite-Depth TBoxes

We give an improved upper bound for finite-depth TBoxes (which arguably cover many practical ontologies [Grauet al., 2013]), pinpointing the exact combined complexity.

Theorem 2. For finite-depth DL-LiteR TBoxes, CCQ an- swering iscoNEXP-complete w.r.t. combined complexity.

Proof sketch. Fix a KBK = (T,A). If T has finite depth, thenCKcontains at most|Ind(A)| × |T ||T |elements, which implies that, for every modelI of K, the interleaving ofI is finite and of single exponential size in|K|. Since the inter- leaving of a countermodel is itself a countermodel, this shows that the smallest countermodel is of single-exponential size, from which derives the improvedcoNEXPupper bound.

4We have slightly modified the definition of interleaving to cor- rect a small bug in the definition from [Kostylev and Reutter, 2015].

We note that the proofs of thecoNPandΠp2lower bounds listed in Table 1 already use finite-depth TBoxes.

5 Rooted Counting CQs

We next explore whether structural restrictions on CCQs al- low us to obtain lower complexity. As the lower bounds from [Kostylev and Reutter, 2015] use disconnected counting vari- ables, a natural idea is to consider the subclass of rooted queries that were introduced in [Bienvenuet al., 2012] and are believed to capture a large portion of real-world CQs.

Rooted CCQs can be defined analogously to rooted CQs.

The definition utilizes the notion of a Gaifman graph of a CCQ, whose vertices are the query terms, and which has an undirected edge{t1, t2}ifft1, t2co-occur in a role atom.

Definition 4. A CCQq(x) := ∃y∃zψ(x,y,z)isrootedif every connected component of the Gaifman graph ofqcon- tains at least one answer variable or individual name.

Example queriesq2andq3are rooted, whileq1is not.

Rootedness has been shown to lower the complexity of rea- soning in several settings. Most relevant to us is a recent re- sult by Nikolaouet al.(2019) that rooted CQ answering under bag semantics5 has tractable data complexity in DL-Litecore, and furthermore, the same holds for rooted versions of the Count()-queries of Kostylev and Reutter under suitable re- strictions on the TBox. These techniques can be adapted to show tractability for arbitrary DL-LitecoreTBoxes:

Theorem 3. (Implicit in [Nikolaouet al., 2019; Cimaet al., 2019]) In DL-Litecore, exhaustive rooted CCQ answering is TC0-complete6w.r.t. data complexity.

Proof sketch. Nikolaou et al. prove that answering rooted CQs under bag semantics can be done via a rewriting to BCALC, whose evaluation problem is known to be in TC0 due to [Libkin, 2001], see [Cimaet al., 2019] for discussion.

Moreover, they further show that for a syntactically restricted class of DL-LitecoreTBoxes, it is possible to reduce exhaus- tive rooted CCQ answering to rooted CQ answering under bag semantics. To obtainTC0 membership for unrestricted TBoxes, the BCALC rewriting can be adapted to set-based rather than bag interpretations. In the long version, we pro- vide an alternative self-contained proof which directly con- structs a family of TC0 circuits. A matching lower bound has not been stated, but can be shown by a simple reduction (using an empty TBox) from theTC0-complete problem that asks, given a binary stringsand numberk, whether the num- ber of1-bits insexceedsk[Aehliget al., 2007].

The preceding result naturally leads us to ask whether root- edness also bring benefits for general CCQs. Unfortunately, we show that restricting to rooted CCQs (without exhaustive- ness) does not allow us to escape existing hardness results:

5Bag semantics, which underly practical database systems, inter- prets relations using multisets rather than sets [Albert, 1991].

6We recall thatTC0 is a circuit complexity class defined simi- larly toAC0but additionally allowing threshhold gates. It is known thatAC0(TC0⊆NC1⊆LogSpace⊆PTime.

(6)

Theorem 4. In DL-Litecore, rooted CCQ answering iscoNP- complete w.r.t. data complexity.

Proof sketch. The proof borrows some ideas from the proofs of Lemmas 12 and 16 from [Kostylev and Reutter, 2015]. It proceeds by reduction from the well-knowncoNP-complete 3COLproblem: given an undirected graphG = (V,E), re- turn yes iff G hasno3-coloring, i.e., a mapping fromV to {red,green,blue}such that adjacent vertices map to different colors (equivalently: there is no monochromatic edge).

The reduction uses atomic rolesEdgeandVertexto en- code the graph andHasColto assign colors. The TBoxTcol has a single axiom: ∃Vertex v ∃HasCol. The ABoxAG

contains an individual vfor each vertex v ∈ V and an as- sertionEdge(u,v) for each edge{u, v} ∈ E. All vertices are connected to a special root individual a: Vertex(a,u), for each u ∈ V. The three colors are represented by in- dividuals r,g andb. To ensure that the query has matches in every model, we include a ‘dummy’ vertex individualav and the following assertions: Vertex(a,av), Edge(av,av), HasCol(av,r),HasCol(av,g), andHasCol(av,b).

The queryqis the conjunction of the two subqueries:

qedge=∃yc∃z1∃z2Vertex(a, z1)∧Vertex(a, z2)∧

Edge(z1, z2)∧HasCol(z1, yc)∧HasCol(z2, yc) qcol=∃y∃zVertex(a, y)∧HasCol(y, z)

serving respectively to detect monochromatic edges and to check whether any additional colors have been introduced.

By construction, there are at least 3 c-matches forq(∅)in any model of the KBKcol = (Tcol,AG). Moreover, it can be verified that(∅,[4,+∞])is a certain answer toqw.r.t.Kcol iffGis not 3-colorable.

Theorem 5. In DL-LiteR, rooted CCQ answering is coNEXP-hard w.r.t. combined complexity.

Proof sketch. The proof adapts a reduction from the expo- nential grid tiling problem (Lemma 18 from [Kostylev and Reutter, 2015]), the key difference being the use of existential query variables to access (and count) the colors and bits.

6 Exhaustive Rooted Counting CQs

We have seen in Section 5 that the rootedness restriction is not by itself sufficient to lower the complexity of CCQ an- swering, whereas imposing both rootedness and exhaustive- ness can sometimes yield better results. This motivates us to take a closer look at the case of exhaustive rooted CCQs. The emerging complexity landscape is summarized in Table 2.

Note that exhaustive CCQs constitute a very natural form of counting query, which ask for the number of different query matches for a given answer tuple. The queryq2from Example 2 is an exhaustive rooted CCQ.

6.1 Exhaustive Rooted CCQs in DL-Lite

core

We first consider DL-Litecore KBs and pinpoint the precise combined complexity, which had not yet been considered.

An essential ingredient is the following result that shows that it is possible to focus on query matches in the canonical

Data Combined

DL-Litecore TC0-c PP-c

DL-LiteR coNP-c Πp2-h,PP-h& incoNEXP Table 2: Complexity results for exhaustive rooted CCQs

model. It can be obtained by adapting a similar result about canonical bag interpretations [Nikolaouet al., 2019].

Theorem 6. For every DL-Litecore KB K and exhaustive rooted CCQq, it holds that[q]K= [q]CK.

Proof sketch. Exploiting the structure of DL-Litecorecanoni- cal models, one can show that ifσ1, σ2are distinct matches of an exhaustive rooted CCQqinCK, then there exists a vari- ablevsuch thatσ1(v)6=σ2(v)andσ1(v), σ2(v)∈Ind(A).

It follows that if we take an arbitrary modelI ofK, and let f be a homomorphism ofCKintoI, thenf injectively maps query matches inCKto query matches inI.

We will also use the next lemma, implicit in [Bienvenuet al., 2013], constraining the possible images of matches inCK: Lemma 2. For every DL-LitecoreTBoxT and CCQq, we can construct in polynomial time a set of wordsΓq,T such that for every KBK = (T,A), matchσofqinCK, and variablev ofq:σ(v) =awfor somea∈Ind(A)andw∈Γq,T.

We are now ready to show that the problem isPP-complete in combined complexity, and hence inPSpace.

Theorem 7. In DL-Litecore, exhaustive rooted CCQ answer- ing isPP-complete w.r.t. combined complexity.

Proof sketch. The classPPcontains all decision problems for which there exists a non-deterministic Turing machine (TM) such that, when the input is a ‘yes’ instance, then at least half of the computation paths accept, while on ‘no’ instances, less than half of the computation paths accept.

The lower bound is obtained by a reduction from the fol- lowing PP-complete problem [Bailey et al., 2007]: given a propositional formula ψ in CNF and number n, decide whetherψhas at leastnsatisfying assignments.

We sketch the TM used to show PPmembership, which takes as input a DL-LitecoreKBK = (T,A), an exhaustive rooted CCQq(x), and candidate answer(a,[m,+∞]):

Phase 1. The TM constructs the setΓq,T from Lemma 2.

Phase 2. The TM guesses a mappingσof the variables inq to elements from{aw|a∈Ind(A), w∈Γq,T}. It then com- paresmwith the numberC=|Γq,T||q|of possible mappings and proceeds accordingly:

• ifm≥ C2 + 1, the TM guesses an integeriwith0≤i≤ 2m−3and accepts iffσis a c-match ofq(a)andi < C;

• if m < C2 + 1, the TM guesses an integer i with0 ≤ i≤2C−2m+ 1and accepts iffσis c-match forq(a)or i < C−2m+ 2.

The guessed integer and comparisons ensure a suitable num- ber of accepting paths. It can be verified that at least half of the paths are accepting iff(a,[m,+∞])∈[q]CK.

(7)

6.2 Exhaustive Rooted CCQs in DL-Lite

R

We now turn to DL-LiteR KBs. Our first result is negative:

exhaustive rooted CCQs do not enjoy lower data complex- ity. This is shown by another reduction from3COL which involves ideas from our proof of Theorem 4 and the proof of Lemma 16 from [Kostylev and Reutter, 2015].

Theorem 8. In DL-LiteR, exhaustive rooted CCQ answering iscoNP-complete w.r.t. data complexity.

More positively, we can show an improvedcoNEXPupper bound in combined complexity for exhaustive rooted CCQs.

We briefly sketch the proof, which involves highly non-trivial modifications to the argument used for general CCQs.

We first introduce a more refined notion of interleaving, which replaces the mappingf0by the following mappingf:

f(a) = f(a) f(ωR) =

f(ωR) iff(ω), f(ωR)∈∆ f(ω)R otherwise

It is possible to prove that when q is an exhaustive rooted CCQ, this modified interleaving yields a countermodel.

Moreover, it has a very particular structure, essentially cor- responding to the canonical model of the restriction off(CK) to∆(viewed as an ABox). Importantly, this means that in- stead of guessing a whole countermodel, it suffices to guess an initial, exponential-size portion (the|q|-neighborhood of

), providing the basis for acoNEXPdecision procedure.

Theorem 9. In DL-LiteR, exhaustive rooted CCQ answering is incoNEXPw.r.t. combined complexity.

7 Best Certain Answers

The definition of certain answers implies that if (a,[m, M])∈[q]K, then we also have(a,[m0, M0])∈[q]K for everym0 ≤ mandM0 ≥ M. It is naturally of interest to focus on certain answers providing the best bounds, i.e., those of the form(a,[minI|=KqaI,maxI|=KqIa]).

In this section, we show that the problem of identifying the best lower bound (minI|=KqaI) is DP-complete in data complexity. It is easily seen that checking whethermis such an optimal bound can be done inDP, by making a call to a coNPoracle (is(a,[m,+∞])∈[q]K?) and anNPoracle (is (a,[m+1,+∞])∈/[q]K?). TheDP-hardness of this problem was left as an open question by Kostylev and Reutter.

Theorem 10. The following problem isDP-hard in data com- plexity: given a DL-LitecoreKBK= (T,A), rooted CCQq, tuplea, and numberm, decide whetherm= minI|=KqaI. Proof sketch. We give a reduction from the following prob- lem (DP-complete due to [Gareyet al., 1976]): givenplanar graphsG1andG2, decide ifG1∈3COLandG2∈/ 3COL.

Let the TBoxTcoland ABoxesAG1,AG2 be defined as in the proof of Theorem 4. Rename the individuals to ensure Ind(AG1)∩Ind(AG2) = ∅, then setK= (Tcol,AG1 ∪ AG2).

Letq1color andq1edge (resp. q2color andq2edge) be defined as before, but using disjoint variables and the root individual from theAG1(resp.AG2). The challenge is to make sure that we can determine the 3-colorability status of the two graphs solely by looking at the number of c-matches of the query. To

be able to distinguishG1fromG2, we introduce an asymme- try by duplicating the color counter query forG1, i.e., create a copyq0colorofq1colorthat uses fresh variables but the same root individual. We then take the query

q() :=qcolor0 ∧ qcolor1 ∧ q1edge ∧ q2color ∧ q2edge. We claim(a,[36,+∞])∈[q]KiffG1∈3COLandG26∈

3COL. This is proven by a case analysis, summarized here:

G1∈3COL G1∈/3COL G2∈3COL 27 (= 3×3×3) 48 (= 4×4×3) G2∈/ 3COL 36 (= 3×3×4) 64 (= 4×4×4)

Each of the four cells displays the least value ofmsuch that (a,[m,+∞])∈[q]K, under different assumptions on the 3- colorability ofG1andG2. To establish these values, one must first prove that every model has at least this many c-matches, and then exhibit a model that realizes the exact number. For the latter, we utilize our assumption that the graphs are planar, hence 4-colorable [Gonthier, 2008], which we use to show that the minimal number of c-matches is realized in a model that encodes proper 3- or 4-colorings of the graphs.

The preceding reduction can be adapted to show DP- hardness also for the two kinds of CCQs from [Kostylev and Reutter, 2015], but without the rootedness restriction.

8 Conclusion & Future Work

We have revisited the issue of counting queries in OMQA and advanced our understanding of the complexity landscape, both by extending existing results to a more general notion of counting CQ and by exploring when structural restrictions on the ontology and query can lead to improved complexity.

There are several natural avenues for future study. A first challenging problem is to provide a full classification of the data complexity of ontology-mediated queries (i.e. query- ontology pairs), in order to identify further tractable cases. It would also be relevant to extend the complexity study to DLs with functional roles or quantified number restrictions, which would allow for non-trivial upper bounds on the number of matches. Tackling general CCQs for such DLs will likely require wholly different techniques from the model manipu- lations used in Section 4. However, a recent result by Cima et al. (2019) shows that the canonical model property (The- orem 6) holds also for DL-LiteF(which extends DL-Litecore

with functional roles), and hence bothTC0data complexity (Theorem 3) and ourPP-completeness result (Theorem 7) for exhaustive rooted CCQs transfer to DL-LiteF.

Much remains to be explored for queries involving other kinds of aggregate functions (min, max, sum, average), which manipulate data values. Recent studies of bag semantics for OMQA [Nikolaou et al., 2019; Cima et al., 2019] and databases with incomplete information [Hernich and Kolaitis, 2017; Consoleet al., 2017] provide important formal founda- tions for supporting such queries.

Acknowledgements

This work was partially supported by ANR project CQFD (ANR-18-CE23-0003).

(8)

References

[Aehliget al., 2007] Klaus Aehlig, Stephen Cook, and Phuong Nguyen. Relativizing Small Complexity Classes and Their Theories. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.

[Albert, 1991] Joseph Albert. Algebraic properties of bag data types. InProceedings of the 17th International Con- ference on Very Large Data Bases (VLDB), pages 211–

219, 1991.

[Baaderet al., 2017] Franz Baader, Ian Horrocks, Carsten Lutz, and Ulrike Sattler. An Introduction to Description Logic. Cambridge University Press, 2017.

[Baileyet al., 2007] Delbert D. Bailey, V´ıctor Dalmau, and Phokion G. Kolaitis. Phase transitions of PP-complete satisfiability problems. Discrete Applied Mathematics, 155(12):1627 – 1639, 2007.

[Bienvenu and Ortiz, 2015] Meghyn Bienvenu and Mag- dalena Ortiz. Ontology-mediated query answering with data-tractable description logics. InTutorial Lectures of the 11th Reasoning Web International Summer School, pages 218–307, 2015.

[Bienvenuet al., 2012] Meghyn Bienvenu, Carsten Lutz, and Frank Wolter. Query containment in description log- ics reconsidered. InProceedings of the 13th International Conference on the Principles of Knowledge Representa- tion and Reasoning (KR), 2012.

[Bienvenuet al., 2013] Meghyn Bienvenu, Magdalena Ortiz, Mantas Simkus, and Guohui Xiao. Tractable queries for lightweight description logics. InProceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI), pages 768–774, 2013.

[Bienvenuet al., 2015a] Meghyn Bienvenu, Stanislav Kikot, and Vladimir V. Podolskii. Tree-like queries in OWL 2 QL: Succinctness and complexity results. InProceedings of the 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 317–328, 2015.

[Bienvenuet al., 2015b] Meghyn Bienvenu, Magdalena Or- tiz, and Mantas Simkus. Regular path queries in lightweight description logics: Complexity and algo- rithms. Journal of Artificial Intelligence Research (JAIR), 53:315–374, 2015.

[Calvaneseet al., 2007] Diego Calvanese, Giuseppe De Gi- acomo, Domenico Lembo, Maurizio Lenzerini, and Ric- cardo Rosati. Tractable reasoning and efficient query an- swering in description logics: The DL-Lite family.Journal of Automated Reasoning (JAR), 39(3):385–429, 2007.

[Calvaneseet al., 2008] Diego Calvanese, Evgeny Khar- lamov, Werner Nutt, and Camilo Thorne. Aggregate queries over ontologies. InProceedings of the 2nd Interna- tional Workshop on Ontologies and Information Systems for the Semantic Web (ONISW), pages 97–104, 2008.

[Cimaet al., 2019] Gianluca Cima, Charalampos Nikolaou, Egor V. Kostylev, Mark Kaminski, Bernardo Cuenca Grau,

and Ian Horrocks. Bag semantics of dl-lite with function- ality axioms. InProceedings of the 18th International Se- mantic Web Conference (ISWC), pages 128–144, 2019.

[Consoleet al., 2017] Marco Console, Paolo Guagliardo, and Leonid Libkin. On querying incomplete information in databases under bag semantics. In Carles Sierra, editor, Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pages 993–999, 2017.

[Gareyet al., 1976] M.R. Garey, D.S. Johnson, and L. Stockmeyer. Some simplified NP-complete graph problems. Theoretical Computer Science, 1(3):237 – 267, 1976.

[Gonthier, 2008] Georges Gonthier. Formal proof – The four-color theorem. Notices of the American Mathemat- ical Society, 55(11):1382–1393, 2008.

[Grauet al., 2013] Bernardo Cuenca Grau, Ian Horrocks, Markus Kr¨otzsch, Clemens Kupke, Despoina Magka, Boris Motik, and Zhe Wang. Acyclicity notions for ex- istential rules and their application to query answering in ontologies. Journal of Artificial Intelligence Research (JAIR), 47:741–808, 2013.

[Guti´errez-Basultoet al., 2015] V´ıctor Guti´errez-Basulto, Yazmin Ang´elica Ib´a˜nez-Garc´ıa, Roman Kontchakov, and Egor V. Kostylev. Queries with negation and inequalities over lightweight ontologies. Journal of Web Semantics (JWS), 35:184–202, 2015.

[Hernich and Kolaitis, 2017] Andr´e Hernich and Phokion G.

Kolaitis. Foundations of information integration under bag semantics. InProceedings of the 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), pages 1–12, 2017.

[Kostylev and Reutter, 2015] Egor V. Kostylev and Juan L.

Reutter. Complexity of answering counting aggregate queries over DL-Lite. Journal of Web Semantics (JWS), 33(1):94–111, 2015.

[Libkin, 2001] Leonid Libkin. Expressive power of SQL.

In Proceedings of the 8th International Conference on Database Theory (ICDT), pages 1–21, 2001.

[Nikolaouet al., 2019] Charalampos Nikolaou, Egor V.

Kostylev, George Konstantinidis, Mark Kaminski, Bernardo Cuenca Grau, and Ian Horrocks. Foundations of ontology-based data access under bag semantics.Artificial Intelligence (AIJ), 274:91 – 132, 2019.

[Poggiet al., 2008] Antonella Poggi, Domenico Lembo, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenz- erini, and Riccardo Rosati. Linking data to ontologies.

Journal of Data Semantics, 10:133–173, 2008.

[Xiaoet al., 2018] Guohui Xiao, Diego Calvanese, Roman Kontchakov, Domenico Lembo, Antonella Poggi, Ric- cardo Rosati, and Michael Zakharyaschev. Ontology- based data access: A survey. InProceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), pages 5511–5519, 2018.

(9)

A Proofs for Section 4 (General Counting CQs)

Theorem 1. For every DL-LiteR(resp. DL-Litecore) KBK = (T,A)and CCQq, if there is a model ofKwith fewer thanm c-matches ofq(a), then there exists one of size7O(|A||T ||q|+1)(resp.O(|A||q|)).

We recall the construction of the interleaving and some of its basic properties. We start with an arbitrary countermodelI, and consider the subdomain∆consisting of individual names fromAand the images of all c-matches ofq(a):

={aI|a∈Ind(A)} ∪ [

σmatch forq(a)inI

σ(z).

We fix a homomorphismfofCKintoI(see Lemma 1) and consider the following mapping from [Kostylev and Reutter, 2015]:

f0: ∆CK → ∆∪∆CK d 7→

f(d) iff(d)∈∆ d otherwise

Theinterleaving8I0ofI is defined as the image ofCKbyf0. More precisely,I0 has domainf0(∆CK) ={f0(d)|d∈∆CK} and interpretation functionf0◦ ·CK, i.e.,AI0 ={f0(d)|d∈ACK}andRI0 ={(f0(d1), f0(d2))|(d1, d2)∈RCK}.

It will be useful to exhibit a homomorphism from the interleaving into the original model, which will embed matches.

Lemma 3. The following mapping is a homomorphism fromI0toI.

ρ: ∆I0 → ∆I f0(d) 7→ f(d).

Proof. We first check that the definition is well-founded, that is: ρ(f0(d))does not depend on the choice ofd. To this end, considerd1,d2such thatf0(d1) =f0(d2). Sincef0maps to∆∪∆CK, we have two cases to examine:

• iff0(d1) =f0(d2)∈∆, that meansf0(d1) =f(d1)andf0(d2) =f(d2), thus ensuringf(d1) =f(d2).

• iff0(d1) =f0(d2)∈∆CK, that meansf0(d1) =d1andf0(d2) =d2, thus ensuring againf(d1) =f(d2).

In both cases, we obtainf(d1) =f(d2), so the function is well-founded.

To show thatρis a homomorphism ofI0intoI, we use the definition off0 and fact thatf is a homomorphism ofCKtoI.

Suppose first thatf0(d) ∈AI. Thend∈ ACK, and sincef is a homomorphism,ρ(f0(d)) =f(d)∈ AI. Suppose next that (f0(d1), f0(d2))∈RI. Then there exist(d01, d02)∈RCKsuch thatf0(d1) =f0(d01)andf0(d2) =f0(d02). Using again the fact thatfis a homomorphism, we obtain(f(d01), f(d02))∈RI0. By the preceding paragraph, this means(f(d1), f(d2))∈RI0. Lemma 4. The interleavingI0is a model ofK.

Proof. We check that all axioms and assertions from the KB are satisfied.

• All ABox assertions fromA are satisfied in CK. Since f0 = f when restricted to the ABox individuals, and f is a homomorphism ofCKinI, it follows that these ABox assertions must also be satisfied inI.

• SinceI0maps homomorphically intoI (by Lemma 3), any violation of an axiom from the TBox inI0implies a similar violation inI. SinceIis a model ofK, this cannot occur.

Theorem 11. The interleavingI0is a countermodel, and every c-matchσinI0satisfies thatσ(z)⊆∆.

Proof. Assume we have a c-match in the interleavingσ: z → ∆I0, which has an associated matchσforq(a). Sinceρis a homomorphism,ρ◦σis a match forq(a)in the original modelI, and its restriction toz, that is,ρ◦σ, is a c-match inI. Hence, it follows from the definition of∆that(ρ◦σ)(z)⊆∆. Asρ−1(∆) = ∆, this impliesσ(z)⊆∆.

Moreover, sinceρ|∆ =id, we in fact have ρ◦σ =σ. We have thus shown that every c-match forq(a)inI0 is also a c-match forq(a)inI, which means the number of c-matches inI0cannot exceed the number of c-matches. AsIwas assumed to be a countermodel (i.e. having less thanmc-matches), it follows that the same holds forI0.

In general, the interleaving has an unbounded size. To reduce the size, we will merge some domain elements, while paying attention not to introduce any new matches. To decide which elements can be merged, we will look at their local properties, which will be formalized using the following notions of chains and neighbourhoods (as in [Kostylev and Reutter, 2015]).

Definition 5 (k-chains with respect to a subdomain). Ak-chain in a modelMwith respect to a subdomainD ⊆∆Mis a sequence(d0, . . . dk)withdiin∆M, such that for all0≤i < k, we have (i)di ∈ D, and (ii) there exists a positive role/ Ri

such that(di, di+1)∈RMi . Note that the final elementdk might belong toD.

7As usual,|T |(resp.|A|,|q|) denotes the size ofT (resp.A, q).

8We have slightly modified the definition of interleaving to correct a small bug in the definition from [Kostylev and Reutter, 2015].

(10)

Definition 6 (n-neighbourhood with respect to a subdomain). Consider a model M and an elementd ∈ ∆M. Itsn- neighbourhoodNn(d,M,D)w.r.t. a subdomainDis the set of elementsd0∈∆Msuch that there exists ak-chain(d0, . . . dk) inMwith respect toDsuch thatk≤n,d0=d, anddk =d0.

Recall that the definition of∆I0ensures that anyd∈∆I0\∆is actually an element of∆CKand therefore we haved=aw for some individual nameaand wordw. The tree-shaped structure of∆CK ensures that there exists a unique prefixrn,dofaw such that :

• f0(rn,d)∈ Nn(d,I0,∆);

• for anyd0∈ Nn(d,I0,∆), there exists a unique wordwdn,d0 such thatd0 =f0(rn,dwdn,d0 ).

We denote byΩnthe set of words over the alphabet of role names occuring in the TBoxT and with length less or equal to2n.

Local properties arounddare then captured by the following function.

χn,d : Ωn → ∆∪ {∅}

w 7→

f0(rn,dw) ifrn,dw∈∆CK andf0(rn,dw)∈∆

∅ otherwise .

The next definition groups together elements having the same local properties.

Definition 7 (Equivalent elements in the interleaving). The equivalence relation∼non∆I0 is defined as follows:

• ford∈∆I0\∆, we haved∼neiffwn,dd =wen,en,dn,e, and|d|=|e| mod 2|q|+ 3,

• ford∈∆,d∼n eiffd=e.

Remark 1. Notice that ifd∼ne, thend∼mefor anym≤n. This property will be used several times without mention.

We can now define a smaller countermodel for our CCQqby merging elements with respect to∼|q|+1. We will usedfor the equivalence class ofdw.r.t.∼|q|+1, and we denote byπthe canonical projection, which maps elements to their respective equivalence classes:

π: ∆I0 →∆I0/∼|q|+1 d7→d

Definition 8 (Reduced interleaving). Thereduced interleavingJ is the interpretation with domain∆I0/ ∼|q|+1and inter- pretation of individual names, atomic concepts and roles given by·J :=π◦ ·I0.

Once again, it follows from the definition thatπ: I0 → J is a homomorphism and thatJ is a model ofK. Since we are considering a quotient, we will not be able to build a general homomorphism fromJ toI0as in Lemma 3. However, local solutions are possible. To improve the readability of the following theorem and later material, we introduce the notation∆for the set{σ|σ∈∆}.

Theorem 12. For anyd∈∆I0, there exists a homomorphismρd:N|q|(d,J,∆)→ N|q|(d,I0,∆)satisfying that : 1. ife∈∆, thenρd(e) =e;

2. ρ−1d (∆) = ∆.

Let us first explain how this will conclude our proof, through the following consequence.

Corollary 1. IfIis a countermodel, then its reduced interleavingJ is a countermodel.

Proof. Assume we have a matchσinJ. Consider a minimal covering ofσ(x∪y∪z)by|q|-neighbourhoods inJ : that is a family of neighbourhoods(N|q|(d1,J), . . .N|q|(dl,J)), withlbeing minimal and such thatσ(x∪y∪z)⊆Sl

k=1N|q|(dk,J).

In particular, the minimality ensures that the only possible overlapping elements between two different neighbourhoods are elements of∆. Along with condition1from Theorem 12, this ensures that the following mapping is well defined:

σ0:x∪y∪z → I0

v 7→ ρdk(σ(v)) ifσ(v)∈ N|q|(dk,J)

Furthermore, as the mappingsρdkprovided by Theorem 12 are homomorphisms, it follows thatσ0is a match inI0. Hence, by Theorem 11, we haveσ0(z)⊆∆. However, condition2from Theorem 12 ensures(ρdk)−1(∆) = ∆, henceσ(z)⊆∆. Therefore, for eachz∈z, we haveσ(z) ={ez}withez∈∆. From condition1, it follows thatσ0(z) =ez. In particular, the mappingσ|z7→σ|z0 is injective, so there are at most as many counting matches inJ than inI0. Hence, ifI0is a countermodel, thenJ also is.

Recalling Theorem 11, we obtain that ifIis a countermodel, thenJ also is.

(11)

To conclude the proof of Theorem 1, notice that an equivalence classdis fully characterized by :

• |d| mod 2|q|+ 3, that is one equivalent class among2|q|+ 3possible classes,

• wd|q|+1,d, that is a word over an alphabet with at most|T |symbols and a length at most|q|+ 1,

• χ|q|+1,d, that is a function from words over an alphabet with at most|T |symbols and length at most2(|q|+ 1), to a set with size at most|∆|+ 1.

Therefore, the amount of possibly different equivalence classes, that is|∆J|, is at most:

(2|q|+ 3)× |T ||q|+2×(|∆|+ 1)|T |2|q|+3.

Since|∆| ≤ |Ind|+n0|q|(recalln0is the amount of c-matches inI, and that we can assumen0≤(|Ind|+|T |)|q|), we have the claimed bounds for the size ofJ, which proves Theorem 1.

Coming back to the proof of Theorem 12, we start by building the mappingsρd. To do so, we need to transformk-chains in the reduced interleaving intok-chains in the interleaving.

Definition 9. (primary role, core of ak-chain) Given a couple(d1, d2)of elements in∆I0 \∆(resp(d1, d2)in∆J \∆), if there exists a positive roleRconnecting these two elements, then we call theprimary role of the edge(d1, d2)the roleSNR

such that eitherd2=d1Sord1=d2S(respwd|q|+1,d2

2 =w|q|,dd1

1Sorw|q|+1,dd1

1 =w|q|,dd2

2S). Notice thatT |= SvR.

The action of a roleRon a wordw, denotedRw, is eitherw0ifw=w0R, orwRotherwise.

Given ak-chainC = (d0, . . . dk)inI0 (resp inJ), we consider itscore, denotedC, being thee k-sequence of role names such thatCeiis the primary role of(di−1, di).

Given the coreCeof ak-chainC, we define its action on a wordw, denotedCew, byCek · · · Ce1w.

Remark 2. For any couple(d1, d2)of elements in∆I0 \∆, the roleSis the primary role of the edge(d1, d2)(inI0) iffSis the primary role of the edge(d1, d2)(inJ).

Lemma 5. For anyk-chain inJ fromdtoe, we havewe|q|+1−k,e=wCde

|q|+1−k,Cde . Proof. We proceed by a straightforward induction on the length ofC.

• IfCis a0-chain, that ise=d, thenCeis empty and thusCed=d.

• Otherwise, C is a (k + 1)-chain (d, d1, . . . dk, e), then consider the primary role S of the edge (dk, e). If w(Ce1...Ck)d

|q|+1−k,(Ce1...fCk)d=w0S, then we have : wCde

|q|+1−(k+1),Cde = Sw(eC1,...Cek)dk

(q+1−k,(Ce1,...Cek)dk

= Sw|q|+1−k,ddk

k

=we|q|+1−(k+1),e.

Otherwise, we have :

w|q|+1−(k+1),Cde Cde = Sw(Ce1,...Cek)dk

(q+1−(k+2),(eC1,...Cek)dk

= Swd|q|+1−(k+2),dk k

=w|q|+1−(k+1),ee .

Lemma 6. For anyk≤ |q|, anyd, e∈∆I0\∆, and any twok-chainsCandC0fromdtoeinJ, we haveCed=Cf0d.

Proof. SinceCandC0have the same endpointsdande, we haveCew|q|+1,dd =fC0w|q|+1,dd .

If|w|q|+1,dd |=|q|+ 1, thenCed=Ce(r|q+1|,dwd|q|+1,d) =r|q+1|,d(Cewd|q|+1,d) =r|q+1|,d(fC0wd|q|+1,d) =Cf0d.

Otherwise,|w|q|+1,dd |<|q|+ 1, we haver|q|+1,d∈∆. Therefore the action ofCe(respCf0) cannot empty the initial word wd|q|+1,d, since it would lead, by Lemma 6, to an element alongC(respC0) being in∆. Thus, we still haveCed=Cf0d.

Références

Documents relatifs

We establish tight complexity bounds for answering an extension of conjunctive 2-way regular path queries over EL and DL-Lite knowledge bases..

In this paper, we consider (temporally ordered) data represented in Description Logics from the popular DL- Lite family and Temporal Query Language, based on the combination of LTL

showed that the satisfiability of PDL formulas extended with arbitrary path negation (such a logic is denoted PDL ¬ ) is undecidable [20], which already implies the undecidability

Then, using these techniques we develop an approach to query answering which can decide at run-time whether Q can be evaluated using ans or a fully-fledged SROIQ reasoner needs to

It is a classic result in database theory that conjunctive query (CQ) answering, which is NP-complete in general, is feasible in polynomial time when restricted to acyclic

The DL-Lite family consists of various DLs that are tailored towards conceptual modeling and allow to realize query answering using classical database techniques.. We only

This gives rise to the notion of aggregate certain answers, which can be explained as follows: a number is an aggregate certain answer to a counting query over a knowledge base if

In particular, [8] propose algorithms for answering temporal queries (w.r.t. TKBs) that generalize TCQs in that they combine queries of a generic atemporal query language Q