• Aucun résultat trouvé

Bag Equivalence of Bounded Symmetry-Degree Conjunctive Queries with Inequalities

N/A
N/A
Protected

Academic year: 2022

Partager "Bag Equivalence of Bounded Symmetry-Degree Conjunctive Queries with Inequalities"

Copied!
14
0
0

Texte intégral

(1)

Conjunctive Queries with Inequalities

Mingmin Chen and Todd J. Green Department of Computer Science

University of California, Davis Davis, CA 95616 USA {chenmi,green}@cs.ucdavis.edu

Abstract. We consider the problem of checking equivalence of conjunc- tive queries with inequalities under bag (multiset) semantics. The prob- lem is known to be decidable inpspace and as hard as graph isomor- phism, but its exact complexity remains open. We introduce a natural restriction based on what we call the symmetry degree of queries, and show that when the symmetry degree is bounded by a fixed polyno- mial in the size of the query, the equivalence problem is inΠ2p. We also show that for asymmetric queries (those of symmetry degree 1), check- ing bag-equivalence is polynomial-time Turing equivalent to the graph automorphism problem. These results can be interpreted as first steps in a more general program of finding “islands of tractability” for testing equivalence of conjunctive queries with inequalities under bag (rather than set) semantics.

1 Introduction

We study the problem of checking equivalence of conjunctive queries (CQs) with built-in inequality predicates over a densely-ordered domain under bag (multiset) semantics. This problem is well-understood assuming the classical set semantics, where it is known that the complexity jumps from NP-complete for ordinary CQs [5] toΠ2p-complete when inequalities are allowed [23, 29]. For bag semantics (implemented by most DBMSs, and hence of practical as well as theoretical interest), the problem is comparatively less-well understood: it is known to be as hard as graph isomorphism [6] and decidable in pspace [27], but the exact complexity is unknown (and seems to be difficult to resolve).

In this paper, we introduce a natural restriction on the structure of CQs called bounded symmetry degree (a bound on the number of query automorphisms) and use this notion to study the complexity of the bag equivalence problem.

The notion is adapted from the well-known concept ofgraph symmetry studied extensively in the graph theory literature [17, 21, 25, 18]. We show that if the symmetry degree of CQs with inequalities is bounded, then bag equivalence can be decided inΠ2p(and we also show that checking for bounded symmetry degree can be done inΠ2p). In fact, we show this holds even when we relax the restriction to allow queries ofpolynomially bounded symmetry degree (see Section 3.1). We

(2)

also consider several stronger forms of symmetry degree bounds, and establish that for asymmetric CQs, bag equivalence of CQs with inequalities reduces to checking graph automorphism, while a bound on the number of self-joins in the queries yields a coNP upper bound. We also compare the notion of bounded symmetry degree with the standard notion of bounded hypertree width, which turns out to be orthogonal and not obviously applicable to the setting of bag semantics. These results may be interpreted as first steps in a more general program of finding “islands of (relative) tractability” for testing equivalence of CQs with inequalities under bag (rather than set) semantics.

Outline. Section 2 contains preliminaries and a discussion of related work. Sec- tion 3 defines the notion of bounded symmetry degree, establishes some basic results, and compares it with bounded hypertree width. Section 4 presents the main results on equivalence, for bag and bag-set semantics. We conclude in Sec- tion 5. The Appendix contains formal proofs of our results.

2 Preliminaries and Related Work

Conjunctive queries with inequalities. Following the standard terminology [1], a conjunctive query (CQ) is a safe, nonrecursive datalog rule whose body is a conjunction of relational atoms. Aconjunctive query with inequalities (CQ<) is a conjunctive query extended with a conjunction of inequality atoms using<,≤, and6=. We use uppercase letters for predicate symbols,x, y, z, . . . for variables, a, b, c, . . . for constants, andt, u, v, . . . for variable or constant terms, with over- bars ¯x,¯a,t, etc. indicating lists. We write a CQ¯ < Qas Q(¯x) :–R, C where R, therelational part, is a conjunction of relational atoms andC, theconditions, is a conjunction of inequalitiesx θ t,θ∈ {<,≤,6=}. We denote byQRthe CQ ob- tained fromQby deletingC. For example, ifQisQ(x, y) :–T(x, y), S(x, y), x >

y, y6= 2, thenQR isQR(x, y) :– T(x, y), S(x, y).

Semantics of queries. We fix a densely-ordered domain of constants which, for concreteness, we will assume is that of the rationalsQ. Aset relation of aritykis a finite setRoftuplesfromQk. Abag (multiset) relation of aritykis a mapping R : Qk → N of tuples to their associatedmultiplicities, where the mapping is non-zero on only finitely many tuples. Aset (resp.,bag)database instance I is an assignment of set (resp., bag) relationsRI to predicate namesR.

The semantics of CQ<s on set and bag relations is based on the notion of valuations. A valuation is mappingν of variables to constants, extended to map constants to themselves. Valuations operate pointwise on lists/tuples and atoms in the expected way. LetQbe a CQQ(¯t) :–R1( ¯t1), . . . , Rn( ¯tn), C and letI be a set instance. The result of evaluating Q on I under set semantics is the set relationJQKI defined

JQKI ={ν(¯t)|ν |=C andν( ¯t1)∈RI1∧ · · · ∧ν( ¯tn)∈RIn}

(3)

Now suppose I is a bag instance. The result of evaluating Q on I under bag semanticsis the bag relationJQKIb defined

JQKIb(¯a) =X

ν

(R1I(ν( ¯t1))× · · · ×RnI(ν( ¯tn)))

where the sum is over all valuationsν such that ν(¯t) = ¯a andν |=C. Finally, suppose I is a set instance. The result of evaluating Q on I under bag-set se- manticsis the bag relationJQKIbsobtained by viewingIas a (duplicate-free) bag relation, and evaluatingQonIunder bag semantics.

A CQ< Qis satisfiable (or consistent [23]) if there exists a set (or, equiva- lently, a bag) database instance I such thatJQKI 6=∅. Every CQ is satisfiable, and a CQ< is satisfiable iff its comparison part is satisfiable. The latter can be checked in linear time using an algorithm in [4]. Here, we focus only on satisfiable queries, and we do not allow equality atomsx=t(since, for satisfiable queries, such atoms can always be eliminated by replacing each occurrence ofxwitht).

Containment mappings and isomorphisms. LetQ(¯x) :–R, CandQ(¯x) :– R, C be two CQ<s, and denote byV(Q) andV(Q) the sets of variables and constants that occur inQandQ, respectively. Acontainment mapping [5] (orhomomor- phism)fromQ toQis a mappingh:V(Q)→ V(Q), lifted to operate on atoms and conjunctions of atoms in the obvious way, such that:

1. h(¯t) = ¯t.

2. his the identity mapping on constants.

3. h(R)⊆R.

4. C|=h(C), i.e.C|=h(a)θ h(b) if a θ b is inC whereθ∈ {<,≤,6=}

A homomorphism his called anisomorphism ifhis bijective and its inverse is also a homomorphism. Note thatQ, Q are isomorphic via mappinghiffQRand Q′Rare isomorphic andCandCare logically equivalent moduloh. Given a can- didate isomorphism mapping, the latter condition can be checked by comparing the transitive closures ofC andC (see Appendix for details).

Anendomorphism (resp.,automorphism) is a homomorphism (resp., isomor- phism) from a CQ< to itself. For a CQ< Q, the set of automorphisms ofQR, denoted by Aut(QR), forms a subgroup of the symmetric group onV(Q). The identity element in this group is the identity mapping. Similarly, the set of en- domorphisms ofQR, denoted by End(QR), forms a monoid.

Linearizations and linear expansions. We use the notions oflinearizations and linear expansionsof CQ<s due to Nutt et al. [27], which are based on the stan- dard notion of linear extensions for partially ordered sets. Let C be a set of inequalities (<,≤,6=) over a setT of variables and constants. Alinearization of C is a total orderLoverT that is compatible withC. (Note thatLmay equate distinct variables, when this is allowed byC.)

If L is a linearization of C, and C is the condition of a CQ< Q, then we denote QL the CQ< obtained fromQby replacingC with L, and we say that

(4)

QLis alinearization ofQ. By default, we do not eliminate equalities in lineariza- tions unless explicitly pointed out. (Thus, the condition for a linearization is a conjunction of inequalities of the formx θ t,θ∈ {<,=}.) Thelinear expansion of Qis the baglin(Q) of all linearizations ofQ(it can be viewed as aunion of CQ<s). We say that linear expansionslin(Q) andlin(Q) areisomorphicif there is a bijectionf :lin(Q)→lin(Q) such thatQLis isomorphic tof(QL) for every QL∈lin(Q).

Known results on equivalence. The equivalence problem for conjunctive queries with or without inequalities under set semantics, bag semantics, bag-set seman- tics have been studied extensively, beginning with the seminal paper by Chan- dra and Merlin [5] which established the NP-completeness of checking contain- ment/equivalence of CQs under set semantics using a characterization in terms of containment mappings. Containment/equivalence of CQ< queries was shown to be inΠ2P by Klug [23] andΠ2P-hard by van der Meyden [29].

The papers by Chaudhuri and Vardi [6] and Ioannidis and Ramakrishnan [20]

initiated the study of query containment and equivalence under bag/bag-set se- mantics. The former paper showed that two CQs are bag-equivalent iff they are isomorphic, which yields a logspace equivalence of the problem with the graph isomorphism problem (GI). The latter paper showed that containment ofunions of CQs is undecidable via a reduction from Hilbert’s Tenth Problem. The decid- ability of bag containment of CQs remains an open problem. (A recent paper by Afrati et al. [2] gives positive decidability results for certain classes of CQs.) Bag containment of CQ<s, on the other hand, was shown to be undecidable by Jayram et al. [22].

Checking bag-set equivalence of CQ<s was shown to be decidable by Nutt et al. [27] (Cohen [8] extends it to bag semantics), a result we shall refer to again in the sequel:

Theorem 1 ([27, 8]).Two CQ<sQ, Q are bag (bag-set) equivalent if and only if they have isomorphic linear expansions, which can be checked in pspace. This theorem also holds for bag-set equivalence for CQ<s. The pspace upper bound seems unlikely to be tight: in fact, using their characterization of equiva- lence in terms of linear expansions, it is not hard to show that the upper bound can be sharpened somewhat to coNP#P.(coNP#Pis contained inpspace, but it is not known whether the containment is strict.) However, the exact complexity of the problem remains open.

The following simple variation on an example given in [27] shows that non- isomorphic CQ<s may indeed have isomorphic linear expansions, hence isomor- phism is not a necessary condition for bag equivalence of CQ<s:

Example 1. Letϕ(x, y, z) be the conjunction of relational atomsR(x, y, z),R(x, z, y), R(y, x, z), R(y, z, x), R(z, x, y), R(z, y, x). Now consider two Boolean CQs<

Q, Q defined

Q() :– ϕ(x, y, z), x < z, y < z, x6=y Q() :– ϕ(x, y, z), x < y, x < z, y6=z

(5)

The linear expansions ofQandQ are

Q1() :– ϕ(x, y, z), x < z, y < z, x > y Q2() :– ϕ(x, y, z), x < z, y < z, x < y Q1() :– ϕ(x, y, z), x < y, x < z, y > z Q2() :– ϕ(x, y, z), x < y, x < z, y < z

Observe thatQ1 is isomorphic toQ1 andQ2 is isomorphic toQ2. By Theorem 1, Q and Q are bag-equivalent since they have isomorphic linear expansions.

However,QandQ are not isomorphic.

3 Symmetry

3.1 Motivation and basic definitions

A CQ< is linear [9, 27] if its body contains no two occurrences of the same relational predicate symbol (in other words, self-joins are not allowed). For linear CQ<, an easy corollary of Theorem 1 (pointed out in [27]) is that bag equivalence can be checked in polynomial time.

In this section, we introduce a generalization of this restriction based on what we term the symmetry degree of a query. This is a natural idea derived from the well-studied notion of the symmetry degree of a graph (see, e.g., [17, 21, 25, 18]), but it does seem to have been applied in database theory before. An automorphism of a directed graph G= (V,E) is a permutation σ of the vertex set V, such that for any edge e = (u, v), σ(e) = (σ(u), σ(v)) is also an edge.

The automorphism groupof G contains all these automorphisms. The order of the automorphism group ofGis called itssymmetry degree, denotedsym(G). For queries, the notion is exactly analogous: we define thesymmetry degreeof a CQ<

Q,sym(Q), to be the order of its automorphism groupAut(QR).

Example 2. To illustrate, consider the Boolean CQs below:

Q1() :– P(x1, x2), P(x2, x3), . . . , P(xn−1, xn) Q2() :– S(u, v), S(u, w)

Q3() :– R(y1), . . . , R(yn)

Q1 has symmetry degree 1 (the variables must all stay put),Q2 has symmetry degree 2 (v and w can be swapped), and Q3 has symmetry degree n! (any permutation of the variables is an automorphism).

We will examine the complexity of computingsym(Q) in Section 3.2. In the meantime, we derive an easy bound based on counting self-joins.

Proposition 1. Suppose predicates p1, . . . , pk each occurn1, . . . , nk times, re- spectively, in the body of a CQ Q. Define the number of self-joins of Q to be n=n1+· · ·+nk−k. Then

sym(Q)≤ Yk

i=1

(ni!)≤(n+ 1)!.

(6)

In particular, the proposition implies that linear queries have symmetry degree one. On the other hand, queries may have low symmetry degree but many self- joins, e.g.,Q1 in Example 2.

We introduce some additional terminology to use in the remainder of the pa- per.Qis calledsymmetric ifAut(QR) is a non-trivial subgroup of the symmetric group on V(Q), i.e. sym(Q)> 1.Q is called asymmetric if Aut(QR) is trivial, which only contains the identity, i.e.sym(Q) = 1.Qis calledrigid ifEnd(QR) is trivial.

Incidence graphs. A CQ has a natural interpretation as an edge-labeled directed hypergraph. However, to apply results of interest from graph theory, we may need to transform this hypergraph into an ordinary directed graph. Given a CQ Q, we construct itsincidence graph [7]G(Q) = (V,E, λ), a labeled directed graph, as follows: V has a vertex labeled xfor each variable x in Q, a vertex labeledcfor each constantcinQ, and a vertex labeledP for each occurrence of predicateP in Q; and E contains an edge labeled n from vertext to predicate vertexP whenevertoccurs as thenth argument inP. For example, the incidence graph of CQQ(x) :–T(x, y), S(x, z), S(z,2) is shown below:

v

S

v T

1 Q

1 1

2

2 S

v

2 x y z 2

21

Q(x) T(x,y)

S(x,z) S(z,2)

We introduce labels in the definition of incidence graph to ensure that the con- struction preserves the symmetry degree1ofQ:

Proposition 2. For any CQQ, we havesym(Q) =sym(G(Q)).

3.2 Complexity of Computing Symmetry Degrees

We next turn to the question of computing the symmetry degree of a CQ, starting with the simpler problem of deciding whether a query is symmetric. Not sur- prisingly, the latter problem reduces easily to thegraph automorphism problem (GA), which is the problem of checking whether a given graph has a non-trivial automorphism.2

1 The automorphism group of a labeled directed graphG= (V,E, λ) is the subgroup of the automorphism group ofG= (V,E) which respects the labelingλ.

2 As with GI, GA is in NP but not known or believed to be either NP-complete or inptime. It is no harder than GI in the sense that the problem is polynomial-time many-one reducible to GI; however the converse relationship is at present unknown.

(7)

Proposition 3. Checking whether a CQ< (CQ) is symmetric is polynomial- time many-one equivalent to the graph automorphism problem.

This leads, using results in [24], to a characterization of the complexity of com- puting the symmetry degree:

Corollary 1. Computing the symmetry degree of a CQ<(or CQ) is polynomial- time Turing equivalent to the graph isomorphism problem.

The following is a direct corollary of a result in [3].

Corollary 2 (of Theorem 15 in [3]). If the number of self-joins in a CQ<

Q is bounded by some constant k, then there is an nO(k) time algorithm that computesAut(QR)as a generating set in the symmetric group onV(Q).

Finally, we consider testing for rigidity. Again our strategy is to reduce the problem on CQ< queries to the analogous problem on graphs, where a result of Goralcik et al. [11] shows as a special case that graph rigidity is coNP-complete.

Proposition 4. The problem of checking whether a CQ<(CQ) is rigid is coNP- complete.

3.3 Symmetry Degree versus Hypertree Width

Gottlob et al. [12, 13] introduced the notion of hypertree width as a generalization of the classical notion of acyclic queries, and proved that Boolean conjunctive queries of bounded hypertree width can be evaluated efficiently (and that, as a consequence, query containment under set semantics is also tractable).

Symmetry degree and hypertree width seem to be essentially orthogonal con- cepts: intuitively, hypertree width is in some sense a measure of cyclicity, rather than symmetry. Moreover, a query with high symmetry degree may have small hypertree width, such as the following CQ that computes a “star self-join” on the first column of a relationR:

Q1() :– R(x, y1), R(x, y2), . . . , R(x, yk)

Q1 has symmetry degree k! but is acyclic hence has hypertree width 1. At the same time, a query with a high hypertree width may have a small symmetry degree, such as the CQ below:

Q2() :–

^k

i=1

^k

j=1,j6=i

Rik−k+j(xi, xj)

(Q2 intuitively corresponds to a complete directed graph without self-loops on k vertices where each edge is labeled with a distinct predicate name.) Q2 has symmetry degree 1, but we can show that it has hypertree width⌈k/2⌉.

We are not aware of any work studying the interaction of hypertree width and bag equivalence of queries. We also do not know of any work studying whether

(8)

bounded hypertree width leads to tractable query containment or equivalence checks for CQ<s under the classical set semantics.

A recent paper by Pichler et al. [28] showed that the the query complexity of evaluating CQs under bag-set semantics becomes tractable when the CQs have bounded treewidth. A natural question (which we leave open here) is whether bounded symmetry degree also makes query evaluation under bag or bag-set semantics tractable.

4 Main Results

We are now ready to present our results on the complexity of bag equivalence of CQ<s having bounded symmetry degree (Section 4.1). We also discuss how these bounds transfer to bag-set semantics (Section 4.2), and present an initial result that applies these ideas to the query equivalence problem under classical set semantics (Section 4.3).

4.1 Bag Equivalence

As already mentioned, Nutt et al. [27] showed that checking bag-equivalence of CQ<s can be done inpspace, but the exact complexity of the problem remains unclear. Here, we show that if the symmetry degree of the queries is bounded, then the upper bound can be lowered to Π2p. In fact, we use a rather liberal notion of “bounded” here where we fix a univariate polynomialP ahead of time and say that a CQ< Qof length nhaspolynomially bounded symmetry degreeif sym(Q)≤P(n).

Theorem 2. Checking bag equivalence of CQ<s of polynomially bounded sym- metry degree is in coNPGI, hence in Π2p.

The proof of the theorem relies on the following technical lemma:

Lemma 1. Let Qbe a CQ< of polynomially bounded symmetry degree, and let QL be one of its linearizations. Then the number of linearizations inlin(Q)that are isomorphic to QL can be computed in polynomial time with access to a GI oracle.

Proof. The basic idea of the proof is as follows: first, we use the GI oracle to com- puteAut(QR); second, we useAut(QR) to compute the number of linearizations in lin(Q) that are isomorphic toQL.

To computeAut(QR), we first compute the generating set ofAut(QR). This can be done in polynomial time using the GI oracle, since it is well-known that the problem of computing the generating set of the automorphism group of a labeled graph is polynomial-time Turing equivalent to GI [24, 19]. Now, since we have assumed sym(Q)≤P(n), and we have the generating set for Aut(QR), it is clear that we can enumerate all the elements ofAut(QR) in polynomial time.

Next, letS denote the set of all linearizations inlin(Q) that are isomorphic to QL. Our goal is to compute|S|. We observe that a linearizationQL ∈S iff

(9)

h(L) =L for some h∈Aut(QR) andL |=C, where C denotes the condition part of Q. In other words, S = {h(QL)| h∈ Aut(QR) andh(L)|= C}. Since we have already computed Aut(QR), we can therefore use it to compute S in polynomial time, and return|S|as the final answer. ⊓⊔

Now we are ready to prove Theorem 2.

Proof. By Lemma 1, given a GI oracle and a linearizationQL ofQ, we can com- pute in polynomial time the numbers of linearizations ofQandQ, respectively, which are isomorphic toQL. If these two numbers are different, then it follows by Theorem 1 thatQandQ are not bag-equivalent. Thus we can guess a non- membership witness and verify it in polynomial time given a GI oracle, which means this problem is in coNPGI. Since GI≤pT NP, it follows that the problem

is also in Π2p. ⊓⊔

We note that the pre-condition of Theorem 2 can certainly be checked in Π2p, by Corollary 1. Also, we note that it is not hard to show that Lemma 1 and Theorem 2 can be generalized to show aΠ2p upper bound for checking bag equivalence ofunionsof CQ<s (UCQ<s). In Section 4.2, we will also generalize it to apply to bag-set equivalence.

Next we show some other (relatively) tractable subcases of checking bag equivalence of CQ<s. Here, we show that if the number of self-joins is bounded by some constant, we can push the upper bound down to coNP.

Theorem 3. If the numbers of self-joins in CQ< Q, Q are bounded by some constantk, then the problem of checking whetherQandQ are bag equivalent is in coNP.

Proof. Here wlog we assume thatQandQ have the same relational part, since their relational parts are isomorphic by Theorem 1. As stated in Proposition 1, the symmetry degree is bounded by (k+ 1)! and we can enumerate the auto- morphism group in constant time. For checking bag equivalence of Qand Q, we can guess a linearizationQL. Initialize two setsSQ andSQ to be empty. By applying the automorphism toQL one by one and see if the comparison part is compatible with the comparison parts ofQ andQ respectively and if yes add it to the correspondingSQ or SQ. This can be done in polynomial time. If the orders of these two sets are different, then Q and Q are not bag-equivalent, since they have different numbers of linearizations that are isomorphic to QL. In other words, we can guess a non-member witness and verify it in polynomial

time. Hence, this problem is in coNP. ⊓⊔

By imposing various restrictions on the symmetry degree of queries, we can establish other upper bounds on the complexity of the problem. The following result is one example.

Theorem 4. Asymmetric CQ<sQandQare bag equivalent iff there is an iso- morphism hfrom Q′R toQR such thath(C)|==|C. Moreover, bag equivalence of asymmetric CQ< queries is polynomial-time Turing equivalent to the graph automorphism problem.

(10)

Proof. By Theorem 1,Q, Q have isomorphic linear expansions. In other words, there is a bijective mappingψ:lin(Q)7→lin(Q) such thatQL is isomorphic to ψ(QL) for all QL ∈ lin(Q). Denote by ϕi the isomorphism mapping fromQLi

toψ(QLi). SinceQ, Q are asymmetric, there is only one isomorphism mapping from QR to Q′R (Suppose there are two ϕ1, ϕ2, then ϕ1◦ϕ−12 is a nontrivial automorphism ofQR, which is a contradiction). So,ϕi are all the same for any QLi ∈ lin(Q). The isomorphism of Q and Q follows. The only problem is to find out this only isomorphism mapping of their relational parts and check if their comparisons are equivalent under this isomorphism. By [24], asymmetric graph isomorphism is polynomial-time Turing equivalent to the graph automor- phism problem. Hence, finding this isomorphism mapping from QR to Q′R is polynomial-time Turing equivalent to the graph automorphism problem, so is

the bag equivalence problem. ⊓⊔

4.2 Bag-Set Equivalence

Next we examine bag-set equivalence of CQ<s. Theorem 4 holds not just for bag semantics, but also for bag-set semantics. Let us now look at extending Theorem 2 as well.

We have the following lemma which is analogous to Lemma 1, but uses an NP (rather than GI) oracle. Intuitively, we use this extra power to deal with the fact that for bag-set semantics, when manipulating linearizations, we are forced to reduce equalities (since this can have the effect revealing that an atom in the body is actually a duplicate copy and should be removed).

Lemma 2. Let Q be a CQ< of polynomially bounded symmetry degree, and let QL be one of its linearizations (with equalities reduced). The number of the linearizations inlin(Q)that are isomorphic toQLcan be computed in polynomial time with access to a NP oracle.

Using this lemma, we can establish the analog of Theorem 2 for bag-set semantics.

Theorem 5. Checking bag-set equivalence of CQ<s of polynomially bounded symmetry degree is in coNPN P2p.

4.3 Set Equivalence

We have already seen in the preceding sections that bounded symmetry degree makes bag (bag-set) equivalence problem of CQ<easier to tackle. We show in this section a preliminary application of bounded symmetry to checking equivalence under the classical set semantics. As mentioned earlier, Klug [23] and Van der Meyden [29] showed that thecontainment/equivalentproblem isΠ2p-complete for CQ<s, hence equivalence is in Π2p. Here we show that the complexity is lower thanΠ2p if the CQ<s are rigid.

Theorem 6. Checking set equivalence of rigid CQ<s is polynomial-time Turing reducible to the graph automorphism problem.

(11)

5 Conclusions and Future Work

We note that the equivalence results on bag semantics of CQ<s presented here have wider applicability to any semantics in which CQ<equivalence is character- ized by isomorphism of linearizations. This includes, for example, equivalence of CQ<on databases annotated with certain kinds of provenance information [15], as well as equivalence under the recently proposedZ-semantics [16].

We conjecture that if only<,≤are allowed as inequalities, two CQ<sQ, Q are bag equivalent if and only if there is an isomorphism fromQ′RtoQRunder which the condition ofQ is equivalent to that ofQ.

One possible future direction involves applying bounded symmetry degree to study query containment. Although the decidability of bag containment of CQs remains a longstanding open problem (Chaudhuri and Vardi [6] showed Π2P- hardness of this problem), bag containment of CQ<s or of conjunctive queries is known to be undecidable [20, 22]. A natural question is whether these unde- cidability results hold even for queries of bounded symmetry degree. In fact, the reduction from Hilbert’s Tenth Problem in [20] constructs a pair of UCQs each of which is composed of asymmetric CQs, hence bag containment remains unde- cidable even for unions of asymmetric CQs. The reduction of [22], on the other hand, uses CQ<s of arbitrary symmetry degree, hence it remains open whether containment of CQ<s of bounded symmetry degree is decidable.

Finally, we would like to pin down the exact complexity of the general case of bag equivalence of CQ< queries, which seems stubborn to resolve. The problem is reminiscent to another open issue in the area, that of identifying the exact complexity of checking bag equivalence of positive relational algebra queries with inequalities. This class of queries is expressively equivalence to UCQ<, but exponentially more concise, and here again the known lower and upper bounds on the complexity are GI andpspace, respectively.

Acknowledgements. We thank the anonymous reviewers for their helpful com- ments. This work was supported by NSF CAREER award IIS-1055107.

References

1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.

2. F. N. Afrati, M. Damigos, and M. Gergatsoulis. Query containment under bag and bag-set semantics. Inf. Process. Lett., 110:360–369, April 2010.

3. V. Arvind and J. K¨obler. On Hypergraph and Graph Isomorphism with Bounded Color Classes. STACS 2006, pages 384–395, 2006.

4. N. Brisaboa, H. Hern´andez, J. Param´a, and M. Penabad. Containment of conjunc- tive queries with built-in predicates with variables and constants over any ordered domain. InAdvances in Databases and Information Systems. Springer, 1998.

5. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. InSTOC, 1977.

(12)

6. S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. InPODS

’93, pages 59–70, New York, NY, USA, 1993. ACM.

7. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. Theor.

Comput. Sci., 239:211–229, May 2000.

8. S. Cohen. Equivalence of queries that are sensitive to multiplicities. The VLDB Journal, 18(3):765–785, 2009.

9. M. P. Consens and A. O. Mendelzon. Graphlog: a visual formalism for real life recursion. InPODS ’90, pages 404–416, New York, NY, USA, 1990. ACM.

10. V. Dalmau and P. Jonsson. The complexity of counting homomorphisms seen from the other side. TCS, 329(1–3):315–323, 2004.

11. P. Goralcik and V. Koubek. Verifying nonrigidity.Information Processing Letters, 22(2):91–95, 1986.

12. G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions and tractable queries. InPODS ’99, pages 21–32, New York, NY, USA, 1999. ACM.

13. G. Gottlob, N. Leone, and F. Scarcello. Hypertree Decompositions: A Survey. In FOCS, 2001.

14. G. Gottlob, N. Leone, and F. Scarcello. Robbers, marshals, and guards: game theoretic and logical characterizations of hypertree width. JCSS, 66(4):775–808, 2003.

15. T. J. Green. Containment of conjunctive queries on annotated relations. InICDT, 2009.

16. T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable differences. InICDT, 2009.

17. Z. Hedrl´ın and A. Pultr. Symmetric relations (undirected graphs) with given semigroups. Monatsh. Math., 69(4):318–322, 1965.

18. P. Hell and J. Nesˇsetˇril. Graphs and homomorphisms. Oxford University Press, USA, 2004.

19. C. Hoffmann. Group-theoretic algorithms and graph isomorphism. Lecture notes in computer science. Springer, 1982.

20. Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: beyond relations as sets. ACM Trans. Database Syst., 20(3):288–324, 1995.

21. S. Janson, T. Luczak, and A. Ruci´nski. Random graphs. Wiley-Interscience Series in Discrete Mathematics and Optimization, 2000.

22. T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for real conjunctive queries with inequalities. InPODS, 2006.

23. A. Klug. On conjunctive queries containing inequalities. J. ACM, 35(1):146–160, 1988.

24. J. K¨obler, U. Sch¨oning, and J. Tor´an. The graph isomorphism problem: its struc- tural complexity. Springer, 1993.

25. J. Lauri and R. Scapellato. Topics in graph automorphisms and reconstruction.

Cambridge Univ Pr, 2003.

26. G. McNulty and C. Shallon. Inherently nonfinitely based finite algebras. In R. Freese and O. Garcia, editors,Universal Algebra and Lattice Theory. 1983.

27. W. Nutt, Y. Sagiv, and S. Shurin. Deciding equivalences among aggregate queries.

InPODS ’98, pages 214–223, New York, NY, USA, 1998. ACM.

28. R. Pichler and S. Skritek. The complexity of evaluating tuple generating depen- dencies. InICDT2011.

29. R. van der Meyden. The complexity of querying indefinite data about linearly ordered domains. InPODS ’92, pages 331–345, New York, NY, USA, 1992. ACM.

(13)

A Transitive closure of inequality set

Consider a satisfiable setC of inequalities, and denote by V(C) the set of vari- ables and constants occurring inC. Thetransitive closure ofC, denoted byC, is the conjunction of all inequalities over V(C) entailed byC. For example, if C={x < y, y≤2}, its transitive closureC={x < y, y≤2, x <2, x≤y, x≤ 2, x6=y, y 6= 2, x6= 2}. (Notice that C |=x <3, but x <3 is not contained in C since 3 does not occur in C.) The transitive closure of a set of inequalities can be computed in polynomial time.

Proposition 5. For setsC1andC2of inequalities, the following are equivalent:

1. C1|==|C2, i.e. C1|=C2 andC2|=C1.

2. C1 andC2 have the same sets of linearizations.

3. C1 andC2 have the same transitive closures.

B Proofs

Proof. (of Proposition 3) One direction is clear as the graph automorphism prob- lem can be trivially reduced to the problem of checking symmetry of a Boolean CQ Qwith a single binary relation E such that E(x, y) and E(y, x) are both subgoals inQif verticesxandyare adjacent in graphG. In the other direction, given a CQ Q, we construct its incidence graphG(Q). By Proposition 2,Q is symmetric iffG(Q) is symmetric.G(Q) is a labeled graph, but it is easy to ver- ify that labeled graph automorphism is many-one reducible to (ordinary) graph

automorphism, which completes the proof. ⊓⊔

Proof. (of Corollary 1) The reductions used in Proposition 3 are parsimonious, so computing the symmetry degree of a CQ< is polynomial-time many-one equiva- lent to #GA3. By results in [24], #GA is polynomial-time Turing equivalent to

the graph isomorphism problem. ⊓⊔

Proof. (of Corollary 2) Since the number of self-joins is bounded, the times of labels that are used in its associated hypergraph are also bounded, which corresponds exactly to bounded color classes. Theorem 15 in [3] states that

“Given a hypergraphX = (V, E) with color classes of size bounded byk, there is annO(k)time algorithm that computesAut(X) as a generating set inSym(V).

In particular, HGI and HGA are inptime.”, which completes the proof. ⊓⊔ Proof. (of Proposition 4) For CQ< (and hence CQ as well) it is easy to see this problem is in coNP because we can guess a nontrivial endomorphism candidate and check in polynomial time if it is an endomorphism. If so, the query is not

3 #GA is the problem of computing the order of graph automorphism group of a given graph. It is polynomial-time Turing equivalent to #GI and GI [24], where #GI is short for the counting the number of graph isomorphisms from one graph to the other.

(14)

rigid. Goralcik et al. [11] showed that the problem of whether a finite algebra A is nonrigid is NP-complete as soon as the type ofAhas either one binary or two unary symbols. This result can be applied in particular to the graph algebra associated with a directed graph [26] to show that checking rigidity of directed graphs is also coNP-complete. Since a directed graph can be transformed to a Boolean CQ using a single binary predicate, such that the graph is rigid iff the associated query is rigid, it follows that checking rigidity of CQs (and hence also CQ<s) is coNP-complete.

Proof. (of Lemma 2) We can find a homomorphismhfromQR toQRL by using this NP oracle. We can also enumerate all the automorphisms ofQby this NP oracle in polynomial time since it has polynomial symmetry degree. Using similar argument as in Lemma 1, by composingh by the elements inAut(QR) one by one and applying toL and then see if it is compatible with C, we can get the number of the linearizations inlin(Q) that are isomorphic toQL. ⊓⊔ Proof. (of Theorem 5) Use the same argument as in Theorem 2, but instead of a GI oracle, we use an NP oracle as Lemma 2 suggests. This yields a coNPNP2p

upper bound. ⊓⊔

Proof. (of Theorem 6) Assuming that two rigid queriesQandQ are set equiv- alent, then there is only one homomorphism mapping ψ from Q′R to QR and only one homomorphism mappingϕ fromQR to Q′R, i.e. QR andQ′R are ho- momorphically equivalent. ButQandQ are rigid, hence cores, and if two cores are homomorphically equivalent then they are isomorphic. Since Q′R and QR are cores, ψ is a unique isomorphism fromQ′R to QR. Now the only problem is to find out this unique isomorphism mapping, and check the conditions of these queries are compatible under this mapping which is in polynomial time.

By [24], asymmetric graph isomorphism is polynomial-time Turing equivalent to the graph automorphism problem. Since rigid graphs are subclass of asymmetric graphs, the set equivalence problem is polynomial-time Turing reducible to the

graph automorphism problem. ⊓⊔

Références

Documents relatifs

We consider here similarity of CQs in general and develop a particular family of conjunctive query similarity measures (CQSMs).. Assessing the similarity in the presence of TBoxes

Nevertheless, we provide deci- sion procedures for testing equivalence of input-free systems containing tree pattern queries with joins, and of systems with input and with tree

 This shows that, in constructing the simple system equivalent to a given system, we can use TP-Datalog queries instead of RTPQs in the input-test and input-query nodes

We propose a general framework to build full symmetry-breaking inequalities in order to handle sub-symmetries arising from solution subsets whose symmetry groups contain the

We show that the proposed sub-symmetry-breaking inequalities outperform state-of-the-art symmetry-breaking formulations, such as the aggregated interval MUCP formulation [22] as well

In practice, functional inequalities are used to prove ultracontractivity and, consequently, to get bounds on the heat kernel of the semigroup, for instance, under a Nash

If the polynomials belong to R n , B´ezout’s theorem is the case h = n − 1: in this case, the dimension is one, and the projective algebraic set defined by the ideal has the

It is easy to adapt his proof (p. 72-83) in the dual setting of locally flat forms, to obtain the following result... Regularization of locally