• Aucun résultat trouvé

Probabilistic Query Evaluation: Towards Tractable Combined Complexity

N/A
N/A
Protected

Academic year: 2022

Partager "Probabilistic Query Evaluation: Towards Tractable Combined Complexity"

Copied!
58
0
0

Texte intégral

(1)

Probabilistic Query Evaluation:

Towards Tractable Combined Complexity

Mikaël Monet1,2, supervised by Pierre Senellart2,3and Antoine Amarilli1 May 31th, 2017

1LTCI, Télécom ParisTech, Université Paris-Saclay; Paris, France

2Inria Paris; Paris, France

3École normale supérieure, PSL Research University; Paris, France

(2)

Introduction

Uncertaintyin data

→ Untrustworthy sources, automated information extraction, imperfect sensor precision in experimental sciences, etc.

Need framework to model this uncertainty and reason about it

→ Probabilistic Databases!

(3)

Introduction

Uncertaintyin data

→ Untrustworthy sources, automated information extraction, imperfect sensor precision in experimental sciences, etc.

Need framework to model this uncertainty and reason about it

→ Probabilistic Databases!

(4)

Plan

1) DefineTID modeland probabilistic query evaluation (PQE)

2) Existing approaches (efficient PQE in the data)

3) Efficient PQE in the query and the data

4) Efficient PQE in the data, reasonable complexity in the query

(5)

Plan

1) DefineTID modeland probabilistic query evaluation (PQE)

2) Existing approaches (efficient PQE in the data)

3) Efficient PQE in the query and the data

4) Efficient PQE in the data, reasonable complexity in the query

(6)

Plan

1) DefineTID modeland probabilistic query evaluation (PQE)

2) Existing approaches (efficient PQE in the data)

3) Efficient PQE in the query and the data

4) Efficient PQE in the data, reasonable complexity in the query

(7)

Plan

1) DefineTID modeland probabilistic query evaluation (PQE)

2) Existing approaches (efficient PQE in the data)

3) Efficient PQE in the query and the data

4) Efficient PQE in the data, reasonable complexity in the query

(8)

Tuple-independent databases (TID)

Probabilistic databases:modeluncertaintyabout data

Simplest model:tuple-independent databases(TID)

• Arelational databaseI

• Aprobability valuationπmapping each fact ofIto[0,1]

Semanticsof a TID(I, π): aprobability distributiononI0I:

• Each factFIis eitherpresentorabsentwith probabilityπ(F)

• Assumeindependenceacross facts

(9)

Example: TID

S a b .5 a c .2

This TID(I, π)represents the followingprobability distribution:

.5×.2 S a b a c

.5×(1−.2) S

a b

(1−.5)×.2 S

a c

(1−.5)×(1−.2) S

(10)

Example: TID

S a b .5 a c .2

This TID(I, π)represents the followingprobability distribution:

.5×.2 S a b a c

.5×(1−.2) S

a b

(1−.5)×.2 S

a c

(1−.5)×(1−.2) S

(11)

Example: TID

S a b .5 a c .2

This TID(I, π)represents the followingprobability distribution:

.5×.2 S a b a c

.5×(1−.2) S

a b

(1−.5)×.2 S

a c

(1−.5)×(1−.2) S

(12)

Example: TID

S a b .5 a c .2

This TID(I, π)represents the followingprobability distribution:

.5×.2 S a b a c

.5×(1−.2) S

a b

(1−.5)×.2 S

a c

(1−.5)×(1−.2) S

(13)

Example: TID

S a b .5 a c .2

This TID(I, π)represents the followingprobability distribution:

.5×.2 S a b a c

.5×(1−.2) S

a b

(1−.5)×.2 S

a c

(1−.5)×(1−.2) S

(14)

Example: TID

S a b .5 a c .2

This TID(I, π)represents the followingprobability distribution:

.5×.2 S a b a c

.5×(1−.2) S

a b

(1−.5)×.2 S

a c

(1−.5)×(1−.2) S

(15)

Probabilistic query evaluation (PQE)

Let us fix:

Relational signatureσ

ClassI ofrelational instancesonσ (e.g., acyclic, treelike)

ClassQofBoolean queries(e.g., paths, trees)

Probabilistic query evaluation(PQE) problem forQandI:

Given aqueryq∈ Q

Given aninstanceI∈ I and aprobability valuationπ

Compute theprobabilitythat(I, π)satisfiesq

→ Pr((I, π)|=q) =P

J⊆I,J|=qPr(J)

(16)

Probabilistic query evaluation (PQE)

Let us fix:

Relational signatureσ

ClassI ofrelational instancesonσ (e.g., acyclic, treelike)

ClassQofBoolean queries(e.g., paths, trees)

Probabilistic query evaluation(PQE) problem forQandI:

Given aqueryq∈ Q

Given aninstanceI∈ I and aprobability valuationπ

Compute theprobabilitythat(I, π)satisfiesq

→ Pr((I, π)|=q) =P

J⊆I,J|=qPr(J)

(17)

Probabilistic query evaluation (PQE)

Let us fix:

Relational signatureσ

ClassI ofrelational instancesonσ (e.g., acyclic, treelike)

ClassQofBoolean queries(e.g., paths, trees)

Probabilistic query evaluation(PQE) problem forQandI:

Given aqueryq∈ Q

Given aninstanceI∈ I and aprobability valuationπ

Compute theprobabilitythat(I, π)satisfiesq

→ Pr((I, π)|=q) =P

J⊆I,J|=qPr(J)

(18)

Complexity of probabilistic query evaluation (PQE)

Question:what is the (data, combined)complexityof PQE depending on the classQofqueriesand classI ofinstances?

(19)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(20)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(21)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(22)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(23)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI (under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(24)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI(under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(25)

Data complexity results: related work

Existingdata dichotomy resulton queries [Dalvi & Suciu, 2012]

I is all instances

• There is a classS ⊆UCQs ofsafe queries

PQE isPTIMEfor anyq∈ S

PQE is#P-hardfor anyqUCQs\S

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

There is an FO query for which PQE is#P-hardonany

unbounded-treewidth graph familyI(under some assumptions) [Amarilli, Bourhis, & Senellart, 2016]

What aboutcombinedcomplexity?

(26)

Wish list

We want:

PQE tractable in combined complexity OR

PQE tractable in the data, reasonable in the query

(27)

Restrict to CQs on graph signatures

∃x y z t R(x,y)S(y,z)S(t,z)

x R y S z S t

R a b .1 b c .1 c d .05 d a 1.

d b .8

S b d .7

d

c b

a

R 1.

R .1 R

.1

R .05 S

a .7 R a .8

(28)

Restrict to CQs on graph signatures

∃x y z t R(x,y)S(y,z)S(t,z)x R y S z S t

R a b .1 b c .1 c d .05 d a 1.

d b .8

S b d .7

d

c b

a

R 1.

R .1 R

.1

R .05 S

a .7 R a .8

(29)

Restrict to CQs on graph signatures

∃x y z t R(x,y)S(y,z)S(t,z)x R y S z S t

R a b .1 b c .1 c d .05 d a 1.

d b .8

S b d .7

d

c b

a

R 1.

R .1 R

.1

R .05 S

a .7 R a .8

(30)

Restrict instances to trees

Q=one-way paths(1WP),I =polytrees(PT)

Q: T S S S T

I:

+ prob. for each edge

T T

T T

S S

S S

S

S

T S

T

Proposition

PQE of1WPonPTis#P-hard

(31)

Restrict instances to trees

Q=one-way paths(1WP),I =polytrees(PT)

Q: T S S S T

I:

+ prob. for each edge

T T

T T

S S

S S

S

S

T S

T

Proposition

PQE of1WPonPTis#P-hard

(32)

Restrict instances to trees

Q=one-way paths(1WP),I =polytrees(PT)

Q: T S S S T

I:

+ prob. for each edge

T T

T T

S S

S S

S

S

T S

T

Proposition

PQE of1WPonPTis#P-hard

(33)

Restrict instances to trees

Q=one-way paths(1WP),I =polytrees(PT)

Q: T S S S T

I:

+ prob. for each edge

T T

T T

S S

S S

S

S

T S

T

Proposition

PQE of1WPonPTis#P-hard

(34)

Our graph classes

1WP

2WP

R S S T

R S S T R

DWT PT

1WP

2WP DWT

PT Connected All

⊆ ⊆

⊆ ⊆

⊆ ⊆

(35)

Results

↓Q I→ 1WP 2WP DWT PT Connected 1WP

2WP

DWT PTIME

PT #P-hard

Connected

>2labels

↓Q I→ 1WP 2WP DWT PT Connected 1WP

2WP

DWT PTIME

PT #P-hard

Connected

No labels

(36)

Results

↓Q I→ 1WP 2WP DWT PT Connected 1WP

2WP

DWT PTIME

PT #P-hard

Connected

>2labels

Q I→ 1WP 2WP DWT PT Connected 1WP

2WP

DWT PTIME

PT #P-hard

Connected

No labels

(37)

Led to a publication in PODS’2017

Contributions:

Detailed study of thecombinedcomplexity of PQE

Focus on CQs on arity-two signatures

Showed the importance of various features on the problem: labels, global orientation, branching, connectedness

Established the complexity for all combinations of the graph classes we considered

Drawbacks and future work:

Our graph classes may seem “arbitrary”

Not yet a dichotomy, just starting to understand the problem

Practical applications?

(38)

Led to a publication in PODS’2017

Contributions:

Detailed study of thecombinedcomplexity of PQE

Focus on CQs on arity-two signatures

Showed the importance of various features on the problem: labels, global orientation, branching, connectedness

Established the complexity for all combinations of the graph classes we considered

Drawbacks and future work:

Our graph classes may seem “arbitrary”

Not yet a dichotomy, just starting to understand the problem

Practical applications?

(39)

Led to a publication in PODS’2017

Contributions:

Detailed study of thecombinedcomplexity of PQE

Focus on CQs on arity-two signatures

Showed the importance of various features on the problem:

labels, global orientation, branching, connectedness

Established the complexity for all combinations of the graph classes we considered

Drawbacks and future work:

Our graph classes may seem “arbitrary”

Not yet a dichotomy, just starting to understand the problem

Practical applications?

(40)

Led to a publication in PODS’2017

Contributions:

Detailed study of thecombinedcomplexity of PQE

Focus on CQs on arity-two signatures

Showed the importance of various features on the problem:

labels, global orientation, branching, connectedness

Established the complexity for all combinations of the graph classes we considered

Drawbacks and future work:

Our graph classes may seem “arbitrary”

Not yet a dichotomy, just starting to understand the problem

Practical applications?

(41)

Led to a publication in PODS’2017

Contributions:

Detailed study of thecombinedcomplexity of PQE

Focus on CQs on arity-two signatures

Showed the importance of various features on the problem:

labels, global orientation, branching, connectedness

Established the complexity for all combinations of the graph classes we considered

Drawbacks and future work:

Our graph classes may seem “arbitrary”

Not yet a dichotomy, just starting to understand the problem

Practical applications?

(42)

Led to a publication in PODS’2017

Contributions:

Detailed study of thecombinedcomplexity of PQE

Focus on CQs on arity-two signatures

Showed the importance of various features on the problem:

labels, global orientation, branching, connectedness

Established the complexity for all combinations of the graph classes we considered

Drawbacks and future work:

Our graph classes may seem “arbitrary”

Not yet a dichotomy, just starting to understand the problem

Practical applications?

(43)

Lowering our expectations

What if we want the complexity to be:

Tractable in the data

Nottoohorrible in the query

Can we then support a more expressive query language (e.g., disjunctions, negations, recursion)?

(44)

Starting point

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

• Problem: nonelementary in the query22

.. .

|Q|

The instance class isparameterized

Idea: one parameter for the instancesandone parameter for the queries

(45)

Starting point

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

• Problem: nonelementary in the query22

.. .

|Q|

The instance class isparameterized

Idea: one parameter for the instancesandone parameter for the queries

(46)

Starting point

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

• Problem: nonelementary in the query22

.. .

|Q|

The instance class isparameterized

Idea: one parameter for the instancesandone parameter for the queries

(47)

Starting point

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

• Problem: nonelementary in the query22

.. .

|Q|

The instance class isparameterized

Idea: one parameter for the instancesandone parameter for the queries

(48)

Starting point

Existingdata dichotomy resulton instances

PQE forMSOonbounded-treewidthinstances haslineardata complexity [Amarilli, Bourhis, & Senellart, 2015]

• Problem: nonelementary in the query22

.. .

|Q|

The instance class isparameterized

Idea: one parameter for the instancesandone parameter for the queries

(49)

Parameterized Complexity

Idea: oneparameterkI for theinstance(treewidth) AND one parameterkQ for thequery

InstanceclassesI1,I2,· · ·

QueryclassesQ1,Q2,· · ·

Definition

The problem isfixed-parameter tractable (FPT) linearif there exists a computable functionf such that it can be solved in time

f(kI,kQ)×|Q|×|I|

(50)

Parameterized Complexity

Idea: oneparameterkI for theinstance(treewidth) AND one parameterkQ for thequery

InstanceclassesI1,I2,· · ·

QueryclassesQ1,Q2,· · ·

Definition

The problem isfixed-parameter tractable (FPT) linearif there exists a computable functionf such that it can be solved in time

f(kI,kQ)×|Q|×|I|

(51)

Parameterized Complexity

Idea: oneparameterkI for theinstance(treewidth) AND one parameterkQ for thequery

InstanceclassesI1,I2,· · ·

QueryclassesQ1,Q2,· · ·

Definition

The problem isfixed-parameter tractable (FPT) linearif there exists a computable functionf such that it can be solved in time

f(kI,kQ)×|Q|×|I|

(52)

Parameterized Complexity

Idea: oneparameterkI for theinstance(treewidth) AND one parameterkQ for thequery

InstanceclassesI1,I2,· · ·

QueryclassesQ1,Q2,· · ·

Definition

The problem isfixed-parameter tractable (FPT) linearif there exists a computable functionf such that it can be solved in time

f(kI,kQ)×|Q|×|I|

(53)

Publication in ICDT’2017 1) A new language...

We introduce the language ofintentional-clique-guarded Datalog(ICG-Datalog), parameterized bybody-sizekP

2) ... withFPT-linear(combined) evaluation...

Given anICG-Datalog programPwithbody-sizekPand a relational instanceIoftreewidthkI, checking ifI|=Pcan be done in timef(kP,kI)× |P| × |I|

3) ... and alsoFPT-linear(combined) computation of provenance

We design a new concise provenance representation based on cyclic Boolean circuits:cycluits

(54)

Publication in ICDT’2017 1) A new language...

We introduce the language ofintentional-clique-guarded Datalog(ICG-Datalog), parameterized bybody-sizekP

2) ... withFPT-linear(combined) evaluation...

Given anICG-Datalog programPwithbody-sizekPand a relational instanceIoftreewidthkI, checking ifI|=Pcan be done in timef(kP,kI)× |P| × |I|

3) ... and alsoFPT-linear(combined) computation of provenance

We design a new concise provenance representation based on cyclic Boolean circuits:cycluits

(55)

Publication in ICDT’2017 1) A new language...

We introduce the language ofintentional-clique-guarded Datalog(ICG-Datalog), parameterized bybody-sizekP

2) ... withFPT-linear(combined) evaluation...

Given anICG-Datalog programPwithbody-sizekPand a relational instanceIoftreewidthkI, checking ifI|=Pcan be done in timef(kP,kI)× |P| × |I|

3) ... and alsoFPT-linear(combined) computation of provenance

We design a new concise provenance representation based on cyclic Boolean circuits:cycluits

(56)

Tree encoding E Two-way Alternating

Tree Automaton A

Database I of treewidth ≤ kI C(x) ← Subway("Corvisart",x)

“Under which conditions is it impossible to go from station Corvisart to station Châtelet with the subway?"

Provenance Cycluit

1 2

(Paris Metro map) C(x) ← C(y)   Subway(y,x) ICG-Datalog program P

of body-size ≤ kP

O( g(kP, kI)|P| ) 

O( g'(kI) |I| )

O(|A |E| )

Goal() ← ¬ C("Châtelet")

(57)

Application to PQE

Theorem

Having fixedkPandkI, we can solve PQE inO(22|P|α|I| |P|).

2EXP, but still better than previous nonelementary bounds

(58)

Conclusion

Up to now:

Study of the combined complexity of PQE

Tractable cases quite restricted

If we lower our expectations then we can capture more expressive query languages

Ongoing and future work:

Lots of open technical questions

Started a collaboration with Dan Olteanu (Univ. of Oxford) on mixed probabilistic models

Practical applications?

Thanks for your attention!

Références

Documents relatifs

An important tool we use for analyzing the complexity of query answering is the notion of materializability of a TBox T , which means that computing the certain answers to any query

We consider DL-Lite N horn , the most expressive dialect of DL-Lite that does not include role inclusions and for which query answering is in AC 0 for data complexity; for relations

• Our result: PQE(Q) is #P-hard for any query Q closed under homomorphisms unless it is equivalent to a safe UCQ. → Dichotomy for probabilistic query evaluation over

We use triangles in the table in Figure 1 to indicate the upper ( O / H ) and lower ( M / N ) bounds that have to be shown in order to obtain all results given in the table..

(Extensions of both query language and DL are considered in [7].) The probabilistic component follows the semantics of probabilistic ABoxes from [6], which itself is based on

While query answering with Description Logics (DLs) is now well-developed, this is much less the case for reasoning services that support ontology engineering and target query

Upon VIHCO, we present the algorithms of parallel and se- quential processing, and explore the answers through exper- iments both on synthetic and real dataset, and the results

II Contributions 77 7 A universal adiabatic quantum query algorithm 79 7.1 Adversary lower bound in the continuous-time