A Dichotomy for Homomorphism-Closed Queries on Probabilistic Graphs
Antoine Amarilli October 29, 2020
Télécom Paris
Table of contents
Introduction and problem statement
Existing results
Main result: Dichotomy on homomorphism-closed queries
More restricted instances: Words, trees and bounded treewidth (1 slide) More restricted instances: Unweighted instances (1 slide)
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
Uncertain data model A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%
Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%
Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%! !
Queries
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
Central database task:evaluate queries
“Is there some personxemployed in an institution who is part of a consortiumz?”
Q(x,z) :∃y x y z Result on this graph:
x z
A. ParisTech
72%
A. IP Paris
99.1%
A. Paris-Saclay
9%
B. IP Paris
20%
B. Paris-Saclay
36%
B. CESAER
80%
Queries
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
Central database task:evaluate queries
“Is there some personxemployed in an institution who is part of a consortiumz?”
Q(x,z) :∃y x y z Result on this graph:
x z
A. ParisTech
72%
A. IP Paris
99.1%
A. Paris-Saclay
9%
B. IP Paris
20%
B. Paris-Saclay
36%
B. CESAER
80%
Queries
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
Central database task:evaluate queries
“Is there some personxemployed in an institution who is part of a consortiumz?”
Q(x,z) :∃y x y z
Result on this graph:
x z
A. ParisTech
72%
A. IP Paris
99.1%
A. Paris-Saclay
9%
B. IP Paris
20%
B. Paris-Saclay
36%
B. CESAER
80%
Queries
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
Central database task:evaluate queries
“Is there some personxemployed in an institution who is part of a consortiumz?”
Q(x,z) :∃y x y z Result on this graph:
x z
A. ParisTech
72%
A. IP Paris
99.1%
A. Paris-Saclay
9% 20% 36% 80%
Queries
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
Central database task:evaluate queries
“Is there some personxemployed in an institution who is part of a consortiumz?”
Q(x,z) :∃y x y z Result on this graph:
x z
A. ParisTech
72%
A. IP Paris
99.1%
A. Paris-Saclay
9%
B. IP Paris
20%
B. Paris-Saclay
36% 80%
Queries
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
Central database task:evaluate queries
“Is there some personxemployed in an institution who is part of a consortiumz?”
Q(x,z) :∃y x y z Result on this graph:
x z
A. ParisTech 72%
A. IP Paris 99.1%
A. Paris-Saclay 9%
Restricting to YES/NO queries
To make the problem simpler to study, we will restrict toYES/NO queries:
•
Query:maps a graph to YES/NO Why can we get away with that?•
Consider a query:Q(x,z) :∃y x y z•
Considereach possible choiceof(x,z), e.g.,(A.,CESAER)•
The queryQ(A.,CESAER)is aYES/NO query:Q(A.,CESAER.) :∃y x y z
•
The number of choices for(x,z)ispolynomialin the input graph→ From now on, all queries areYES/NO queries,
so we have just one YES/NO answer to compute, orjust oneprobability
Restricting to YES/NO queries
To make the problem simpler to study, we will restrict toYES/NO queries:
•
Query:maps a graph to YES/NO Why can we get away with that?•
Consider a query:Q(x,z) :∃y x y z•
Considereach possible choiceof(x,z), e.g.,(A.,CESAER)•
The queryQ(A.,CESAER)is aYES/NO query:Q(A.,CESAER.) :∃y x y z
•
The number of choices for(x,z)ispolynomialin the input graph→ From now on, all queries areYES/NO queries,
so we have just one YES/NO answer to compute, orjust oneprobability
Restricting to YES/NO queries
To make the problem simpler to study, we will restrict toYES/NO queries:
•
Query:maps a graph to YES/NO Why can we get away with that?•
Consider a query:Q(x,z) :∃y x y z•
Considereach possible choiceof(x,z), e.g.,(A.,CESAER)•
The queryQ(A.,CESAER)is aYES/NO query:Q(A.,CESAER.) :∃y x y z
•
The number of choices for(x,z)ispolynomialin the input graph→ From now on, all queries areYES/NO queries,
so we have just one YES/NO answer to compute, orjust oneprobability
Restricting to YES/NO queries
To make the problem simpler to study, we will restrict toYES/NO queries:
•
Query:maps a graph to YES/NO Why can we get away with that?•
Consider a query:Q(x,z) :∃y x y z•
Considereach possible choiceof(x,z), e.g.,(A.,CESAER)•
The queryQ(A.,CESAER)is aYES/NO query:Q(A.,CESAER.) :∃y x y z
•
The number of choices for(x,z)ispolynomialin the input graph→ From now on, all queries areYES/NO queries,
so we have just one YES/NO answer to compute, orjust oneprobability
Restricting to YES/NO queries
To make the problem simpler to study, we will restrict toYES/NO queries:
•
Query:maps a graph to YES/NO Why can we get away with that?•
Consider a query:Q(x,z) :∃y x y z•
Considereach possible choiceof(x,z), e.g.,(A.,CESAER)•
The queryQ(A.,CESAER)is aYES/NO query:Q(A.,CESAER.) :∃y x y z
•
The number of choices for(x,z)ispolynomialin the input graph→ From now on, all queries areYES/NO queries,
so we have just one YES/NO answer to compute, orjust oneprobability
Restricting to YES/NO queries
To make the problem simpler to study, we will restrict toYES/NO queries:
•
Query:maps a graph to YES/NO Why can we get away with that?•
Consider a query:Q(x,z) :∃y x y z•
Considereach possible choiceof(x,z), e.g.,(A.,CESAER)•
The queryQ(A.,CESAER)is aYES/NO query:Q(A.,CESAER.) :∃y x y z
•
The number of choices for(x,z)ispolynomialin the input graphQuery languages
Which kinds of queries do we want to express?
•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?• e.g., ∃x y z x y z
∨ ∃x y z w x y z w
→ Formally: afinite disjunctionof CQs
•
Regular path queries(RPQ): can I find a match of aregular path?• e.g.,∃x y x ∗ y
Query languages
Which kinds of queries do we want to express?
•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?• e.g., ∃x y z x y z
∨ ∃x y z w x y z w
→ Formally: afinite disjunctionof CQs
•
Regular path queries(RPQ): can I find a match of aregular path?• e.g.,∃x y x ∗ y
Query languages
Which kinds of queries do we want to express?
•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?• e.g., ∃x y z x y z
∨ ∃x y z w x y z w
→ Formally: afinite disjunctionof CQs
•
Regular path queries(RPQ): can I find a match of aregular path?• e.g.,∃x y x ∗ y
Query languages
Which kinds of queries do we want to express?
•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?• e.g., ∃x y z x y z
∨ ∃x y z w x y z w
→ Formally: afinite disjunctionof CQs
•
Regular path queries(RPQ): can I find a match of aregular path?• e.g.,∃x y x ∗ y
Query languages
Which kinds of queries do we want to express?
•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?• e.g., ∃x y z x y z
∨ ∃x y z w x y z w
→ Formally: afinite disjunctionof CQs
•
Regular path queries(RPQ): can I find a match of aregular path?• e.g.,∃x y x ∗ y
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ:∃x y z x y z•
Theinputis a TIDD: A. B.İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40% 80%
100%
90% 90% 50% 90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ:∃x y z x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ:∃x y z x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ:∃x y z x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ:∃x y z x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P Pr(W)
PQE: a simple example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis: (1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis: (1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis: (1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis:(1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)
•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis:(1−80%)
×(1−10%)×(1−40%)× (1−80%)×(1−100%)
•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis:(1−80%)×(1−10%)
×(1−40%)× (1−80%)×(1−100%)
•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis:(1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)
•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a simple example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis:(1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)
•
This givesx=0%, so the query has•
This process is inpolynomial timePQE: a simple example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
Find the probability of:∃x y x y
•
It’s easier to compute the probabilityx that there isno match of the query→ The probability we want is1−x
•
There is no match of the query iff every red edge is not kept•
These choices are independent, soxis:(1−80%)×(1−10%)×(1−40%)× (1−80%)×(1−100%)
•
This givesx=0%, so the query has probability100%•
This process is inpolynomial timePQE: a more complicated example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−
(1−10%)×(1− 40%)
)×(1−
(1−50%)×(1−90%))
)× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−
(1−10%)×(1− 40%)
)×(1−
(1−50%)×(1−90%))
)× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(
1−
(1−10%)×(1− 40%)
)×(1−
(1−50%)×(1−90%))
)× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(
1−
(1−10%)×(1− 40%)
)×(1−
(1−50%)×(1−90%))
)× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(
1−
(1−10%)×(1− 40%)
)×(1−
(1−50%)×(1−90%))
)× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1− 40%)
)×(1−
(1−50%)×(1−90%))
)
× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1−40%))×(1−
(1−50%)×(1−90%))
)
× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1−40%))×(1−(1−50%)×(1−90%)))
× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1−40%))×(1−(1−50%)×(1−90%)))
× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1−40%))×(1−(1−50%)×(1−90%)))× (1−80%×(1−(1−90%)×(1−90%))),
i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1−40%))×(1−(1−50%)×(1−90%)))× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timePQE: a more complicated example
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
How to compute the probability of the query from the previous slide?
∃x y z x y z
•
Key insight: consider all possible choices for the middle variabley•
1−(1−80%)×(1−(1−(1−10%)×(1−40%))×(1−(1−50%)×(1−90%)))× (1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
•
This is scary butpolynomial timeResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata My work:several dichotomies on the PQE problem:
•
Existing resultson UCQ:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
This talk:dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
I’ll also mention some of my work onrestricted graph classesResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata
My work:several dichotomies on the PQE problem:
•
Existing resultson UCQ:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
This talk:dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
I’ll also mention some of my work onrestricted graph classesResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata My work:several dichotomies on the PQE problem:
•
Existing resultson UCQ:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
This talk:dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
I’ll also mention some of my work onrestricted graph classesResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata My work:several dichotomies on the PQE problem:
•
Existing resultson UCQ:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
This talk:dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
I’ll also mention some of my work onrestricted graph classesResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata My work:several dichotomies on the PQE problem:
•
Existing resultson UCQ:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
This talk:dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
I’ll also mention some of my work onrestricted graph classesResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata My work:several dichotomies on the PQE problem:
•
Existing resultson UCQ:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
This talk:dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
I’ll also mention some of my work onrestricted graph classesTable of contents
Introduction and problem statement Existing results
Main result: Dichotomy on homomorphism-closed queries
More restricted instances: Words, trees and bounded treewidth (1 slide) More restricted instances: Unweighted instances (1 slide)
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g.,∃x y x y or∃x y z x y z
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g.,∃x y x y or∃x y z x y z
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g.,∃x y x y or∃x y z x y z
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g.,∃x y x y or∃x y z x y z
PQE is sometimes #P-hard
Let us show thatPQE(Q)is#P-hardfor the CQQ:
x y z w
•
Reduce from the problem ofcounting satisfying valuationsof a Boolean formula• e.g., given(x∨y)∧z, compute that it has3satisfying valuations
•
This problem is already#P-hardfor so-calledPP2DNF formulas:• Positive(no negation) andPartitioned variables:X1, . . . ,XnandY1, . . . ,Ym
• 2-DNF: disjunction of clauses likeXi∧Yj
•
Example:φ: (X1∧Y1)∨(X1∧Y2)∨(X2∧Y2)∨(X3∧Y1)∨(X3∧Y2)a01 a02 a03
a1 a2 a3 1/2 1/2 1/2
b1
b2
b01
b02 1/2
1/2 1
1 1
1 1
Idea: Satisfying valuationsofφcorrespond topossible worldswith amatchofQ