Uniform Reliability of Self-Join-Free Conjunctive Queries

(1)

Uniform Reliability of Self-Join-Free Conjunctive Queries

Antoine Amarilli¹, Benny Kimelfeld² March 23, 2021

1Télécom Paris

2Technion

(2)

Uniform reliability

We study theuniform reliability(UR) problem for relational databases:

•

^{Fix a}Boolean Conjunctive Query(CQ)Q

•

^{Define the}counting problemUR(Q):

• Input: a database instanceI

• Output:how manysubinstancesofI(subset of facts) satisfyQ Class

Class Room Date CS 1 101 2021-03-26 CS 2 101 2021-04-02

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

Q:∃c r dClass(c,r,d)∧Lockdown(d) The number of subinstances satisfyingQ is: 0+2+2+3=7

•

We can always solveUR(Q)by looking atall subinstances(exponential)

→ When can we achieve a better complexity?

(3)

•

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

•

(4)

•

• Output:how manysubinstancesofI(subset of facts) satisfyQ

Class Class Room Date CS 1 101 2021-03-26 CS 2 101 2021-04-02

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

•

(5)

•

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

Q:∃c r dClass(c,r,d)∧Lockdown(d)

The number of subinstances satisfyingQ is: 0+2+2+3=7

•

(6)

•

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

Q:∃c r dClass(c,r,d)∧Lockdown(d) The number of subinstances satisfyingQ is:

0+2+2+3=7

•

(7)

•

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

•

(8)

•

Lockdown Date 2021-03-26 2021-04-02

Consider the query:

(9)

Motivation and connections

•

We think that uniform reliability is anatural question

•

It also connects to other works:

• Query explanationsusing measures from computational social choice:

• theShapley value[Livshits et al., 2020]

• thecausal effect[Salimi, 2016]

• Probabilistic query evaluationontuple-independent databases[Suciu et al., 2011]

→ Let’s review the line of work onprobabilistic query evaluation

(10)

•

(11)

•

(12)

•

(13)

•

(14)

Uncertain data and tuple-independent databases (TID)

•

We consider data in therelational modelon which we haveuncertainty

•

Tuple-independent databases(TID): the simplest probabilistic model Class

Class Room Date

CS 1 101 2021-03-26 80%

CS 2 101 2021-04-02 70%

Lockdown

Date

2021-03-26 20%

2021-04-02 40%

•

^Semantics:

• Every tuple is annotated with aprobability

• We assume that all tuples areindependent

• A TID concisely represents aprobability distributionover thesubinstances

→ Uniform reliabilityamounts to a TID whereall facts have probability 1/2

(15)

•

CS 1 101 2021-03-26 80%

CS 2 101 2021-04-02 70%

Lockdown

Date

2021-03-26 20%

2021-04-02 40%

•

^Semantics:

• A TID concisely represents aprobability distributionover thesubinstances

→ Uniform reliabilityamounts to a TID whereall facts have probability 1/2

(16)

•

CS 1 101 2021-03-26 80%

CS 2 101 2021-04-02 70%

Lockdown

Date

2021-03-26 20%

2021-04-02 40%

•

^Semantics:

(17)

Probabilistic query evaluation (PQE)

•

We consider againBoolean Conjunctive Queries(CQs)

• e.g.,Q:∃c r dClass(c,r,d)∧Lockdown(d)

•

^Semantics^of^Qon a TID: compute theprobabilitythatQis true

•

Formally, problemPQE(Q)for a fixed CQQ:

• Input: a TIDI

• Output:the total probability of the subinstances ofIwhereQis true

•

Again we can solvePQE(Q)by looking at all subinstances (exponential)

(18)

•

• Input: a TIDI

•

(19)

•

• Input: a TIDI

•

(20)

•

• Input: a TIDI

•

(21)

•

• Input: a TIDI

•

(22)

Existing results

•

Complexity of PQE shown in [Dalvi and Suciu, 2007] forself-join-free CQs(SJFCQs)

• A CQ isself-join-freeif no relation symbol is repeated

•

Later extended tounions of conjunctive queries[Dalvi and Suciu, 2012]

In this work we stick to the result on SJFCQs: Theorem [Dalvi and Suciu, 2007]

LetQbe a SJFCQ. Then:

•

^Either^Q^ishierarchicalandPQE(Q)is inPTIME

•

^Or^Q^isnot hierarchicalandPQE(Q)is#P-hard What is this class ofhierarchical CQs?

(23)

•

In this work we stick to the result on SJFCQs:

Theorem [Dalvi and Suciu, 2007]

•

^Or^Q^isnot hierarchicalandPQE(Q)is#P-hard

What is this class ofhierarchical CQs?

(24)

•

In this work we stick to the result on SJFCQs:

Theorem [Dalvi and Suciu, 2007]

•

^Or^Q^isnot hierarchicalandPQE(Q)is#P-hard

(25)

Hierarchical CQs

For a CQQ, writeatoms(x)for the set of atoms wherexappears

•

^{A CQ is}hierarchicalif for every variablesxandy

• Eitheratoms(x)andatoms(y)aredisjoint

• Or one isincludedin the other

•

^{A CQ is}non-hierarchicalif there are two variablesxandysuch that

• Some atom containsbothxandy

• Some atom containsxbut noty

• Some atom containsybut notx

→ Simplest example: theR-S-Tquery:Q1 :∃x y R(x),S(x,y),T(y)

•

^Intuitionfor arity-2 queries: the hierarchical CQs areunions of star-shaped patterns Exercise: Is our example CQ hierarchical?∃c r dClass(c,r,d)∧Lockdown(d)... Yes!

(26)

•

(27)

•

^Intuitionfor arity-2 queries: the hierarchical CQs areunions of star-shaped patterns

Exercise: Is our example CQ hierarchical?∃c r dClass(c,r,d)∧Lockdown(d)... Yes!

(28)

•

^Intuitionfor arity-2 queries: the hierarchical CQs areunions of star-shaped patterns

... Yes!

(29)

•

(30)

Our results on uniform reliability

Let us return to our problem ofuniform reliability(UR):

•

We only study the problem onSJFCQs(see future work)

•

^Forhierarchical SJFCQs, UR is tractable because PQE is tractable

•

^Fornon-hierarchical SJFCQs, the complexity of UR isunknown

• Thehardness proofof PQE (see later) crucially uses probabilities1/2and1 We settle the complexity of UR for SJFCQs by showing:

Theorem

LetQbe a non-hierarchical SJFCQ. ThenUR(Q)is#P-hard. Rest of the talk: proof sketch of this result

(31)

•

Theorem

(32)

•

Theorem

(33)

•

^Fornon-hierarchical SJFCQs, the complexity of UR is unknown

• Thehardness proofof PQE (see later) crucially uses probabilities1/2and1

We settle the complexity of UR for SJFCQs by showing: Theorem

(34)

•

Theorem

LetQbe a non-hierarchical SJFCQ. ThenUR(Q)is#P-hard.

Rest of the talk: proof sketch of this result

(35)

•

Theorem

LetQbe a non-hierarchical SJFCQ. ThenUR(Q)is#P-hard.

Rest of the talk: proof sketch of this result

(36)

Reducing toR-S-T-type queries

•

^AnR-S-T-type queryis a non-hierarchical SJFCQ of the form:

R₁(x), . . . ,R_r(x),S₁(x,y), . . . ,S_s(x,y),T₁(y), . . . ,T_t(y) for some integersr,s,t>0

•

^Lemma:for any non-hierarchical SJFCQQ, there is anR-S-T-type queryQ⁰ such that UR(Q⁰)reduces toUR(Q)

atoms(x) atoms(x)∩atoms(y) atoms(y)

r s t

•

So it suffices to show thatUR(Q⁰)is#P-hardfor theR-S-T-type queriesQ⁰

•

In this talk:we focus for simplicity onQ₁ :∃x y R(x),S(x,y),T(y)

(37)

•

r s t

•

(38)

•

r s t

•

(39)

•

r s t

•

we focus for simplicity on

(40)

Hard problem: counting independent sets of bipartite graphs u₁

u₂ u3

v₁

v₂

•

Independent setof a bipartite graph: subset of its vertices that contains no edge

• Example:{u2,v1}

•

^{It is}#P-hard, given a bipartite graph, to count its independent sets This easily shows the#P-hardnessof PQE (but not UR!) forQ1:∃x yR(x),S(x,y),T(y):

R(u₁): 1/2

R(u2): 1/2 R(u₃): 1/2

T(v₁): 1/2

T(v₂): 1/2 S: 1

S: 1 S: 1

We will show how to reduce from counting independent sets toUR(Q₁)

(41)

u₂ u3

v₁

v₂

•

• Example:{u2,v1}

•

^{It is}#P-hard, given a bipartite graph, to count its independent sets

This easily shows the#P-hardnessof PQE (but not UR!) forQ1:∃x yR(x),S(x,y),T(y): R(u₁): 1/2

R(u2): 1/2 R(u₃): 1/2

T(v₁): 1/2

T(v₂): 1/2 S: 1

S: 1 S: 1

(42)

u₂ u3

v₁

v₂

•

• Example:{u2,v1}

•

R(u₁): 1/2 R(u2): 1/2 R(u₃): 1/2

T(v₁): 1/2

T(v₂): 1/2 S: 1

S: 1 S: 1

(43)

u₂ u3

v₁

v₂

•

• Example:{u2,v1}

•

R(u₁): 1/2 R(u2): 1/2 R(u₃): 1/2

T(v₁): 1/2

T(v₂): 1/2 S: 1

S: 1 S: 1

We will show how to reduce from counting independent sets to

(44)

Idea: parameterizing the count u₁

u₂

u₃

v₁

v₂

For a bipartite graph(U,V,E)and a subsetW ⊆U∪Vof vertices, write:

•

^c(W)the number of edgescontainedinW

• Here,c(W) =1

•

^d(W)^(resp.,^d⁰^(W)) the number of edges having exactly their left (resp., right) endpoint inW

• Here,d(W) =2andd⁰(W) =1

•

^e(W)the number of edgesexcluded fromW

• Here,e(W) =1

•

Hard problem:counting independent setsX=|{W⊆U∪V |c(W) =0}|

•

Harder problem:computing all the values: X_c,d,d⁰_,e =

{W ⊆U∪V |c(W) =candd(W) =dandd⁰(W) =d⁰ ande(W) =e}

(45)

u₂

u₃

v₁

v₂

•

• Here,c(W) =1

•

• Here,e(W) =1

•

(46)

u₂

u₃

v₁

v₂

•

• Here,c(W) =1

•

• Here,e(W) =1

•

(47)

u₂

u₃

v₁

v₂

•

• Here,c(W) =1

•

• Here,e(W) =1

•

(48)

u₂

u₃

v₁

v₂

•

• Here,c(W) =1

•

• Here,e(W) =1

•

(49)

u₂

u₃

v₁

v₂

•

• Here,c(W) =1

•

• Here,e(W) =1

•

Harder problem:computing all the values:

X =

(50)

Idea: coding to several copies

•

We want to design areduction:

• We reducefrom(we want): given a bipartite graphG, compute theX_c,d,d⁰_,e

• We reduceto(we have): given a database instanceD, computeUR(Q1)

•

^Idea:^code^G^{to a}^familyof instancesD_pindexedbyp>0

•

^{Fix a}^box^B^p^(a,^b)^{for index}^p^>⁰: an instance with two distinguished elements(a,b)

•

^Code^G^{for index}^p^>⁰to an instance by:

• putting anR-fact on eachU-vertex and aT-fact on eachV-vertex

• coding every edge(u,v)by acopy of the boxB_p(u,v) u₁

u2

u₃

v₁

v₂

R(u₁) R(u2) R(u₃)

T(v₁)

T(v₂) B_p

B_p B_p

B_p

Bp

(51)

•

u2

u₃

v₁

v₂

R(u₁) R(u2) R(u₃)

T(v₁)

T(v₂) B_p

B_p B_p

B_p

Bp

(52)

•

u2

v₁ R(u₁)

R(u₂)

T(v₁) B_p

B_p Bp

(53)

Getting an equation system

Take the coding ofGfor indexp, and compute the numberN_pof subinstancesviolatingQ₁ R(u₁)

R(u₂) R(u₃)

T(v₁)

T(v₂) B_p

Bp

B_p B_p B_p

•

^{We have:}

N_p= X

W⊆V

N^W_p

= X

W⊆V

γ_p^c(W)δ^d(W)_p (δ_p⁰)^d⁰^(W)η_p^e(W) = X

c,d,d⁰,e

X_c,d,d⁰_,e×γ^c_pδ_p^d(δ_p⁰)^d⁰η_p^e

whereN^W_p is the number of subinstances violatingQ₁when fixing theR-facts and T-facts to be precisely onW

•

^Now^N^W^p only depends on:

• The numbersc(W),d(W),d⁰(W),e(W)of edgescontained,dangling, orexcludedfromW

• The numbersγ_p,δ_p,δ⁰_p,η_pof subinstances of the boxB_pthat violateQ₁ when fixingR-facts onaand/orT-facts onb

(54)

R(u₂) R(u₃)

T(v₁)

T(v₂) B_p

Bp

B_p B_p

B_p

•

^{We have:}

N_p= X

W⊆V

N^W_p

= X

W⊆V

c,d,d⁰,e

X_c,d,d⁰_,e×γ_p^cδ_p^d(δ_p⁰)^d⁰η_p^e

•

(55)

R(u₂) R(u₃)

T(v₁)

T(v₂) B_p

Bp

B_p B_p

B_p

•

^{We have:}

N_p= X

W⊆V

N^W_p

= X

W⊆V

c,d,d⁰,e

•

(56)

R(u₂) R(u₃)

T(v₁)

T(v₂) B_p

Bp

B_p B_p

B_p

•

^{We have:}

N_p= X

W⊆V

N^W_p

= X

W⊆V

c,d,d⁰,e

•

^Now^N^W only depends on:

(57)

R(u₂) R(u₃)

T(v₁)

T(v₂) B_p

Bp

B_p B_p

B_p

•

^{We have:}

N_p= X

W⊆V

N^W_p

= X

W⊆V

c,d,d⁰,e

•

• The numbers , , ⁰, of subinstances of the boxB that violateQ

(58)

R(u₂) R(u₃)

T(v₁)

T(v₂) B_p

Bp

B_p B_p

B_p

•

^{We have:}

N_p= X

W⊆V

N^W_p = X

W⊆V

γ_p^c(W)δ^d(W)_p (δ⁰_p)^d⁰^(W)η_p^e(W)

= X

c,d,d⁰,e

•

^Now^N^W only depends on: