Query Evaluation on Probabilistic Data A Story of Dichotomies
Antoine Amarilli May 19, 2020
Table of contents
Introduction and problem statement
Existing results
More general queries: Dichotomy on homomorphism-closed queries More restricted instances: Words, trees and bounded treewidth More restricted instances: Unweighted instances
Conclusion and open problems
2/42
Uncertain data management
Relational databases managedata, represented here as alabeled graph
WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
3/42
Uncertain data management
Relational databases managedata, represented here as alabeled graph WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
3/42
Uncertain data management
Relational databases managedata, represented here as alabeled graph WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
Uncertain data management
Relational databases managedata, represented here as alabeled graph WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay Technion CESAER
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
→ Problem:we are notcertainabout the true state of the data
3/42
Uncertain data management
Relational databases managedata, represented here as alabeled graph WorksAt
Antoine Télécom Paris Antoine Paris Sud
Benny Paris Sud
Benny Technion
İsmail U. Oxford
MemberOf
Télécom Paris ParisTech Télécom Paris IP Paris
Paris Sud IP Paris Paris Sud Paris-Saclay
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
Uncertain data model
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
4/42
Uncertain data model
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
4/42
Uncertain data model
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
4/42
Uncertain data model
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay
80% 10%
40% 80%
100%
90% 90% 50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%
Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%
Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
4/42
Uncertain data model
A.
B.
Télécom Paris
Paris Sud
Technion
ParisTech
IP Paris
Paris-Saclay 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
Uncertain data model
A.
B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Uncertain data model:TID, for tuple-independent database•
Each fact (edge) carries aprobability•
Each fact exists with its givenprobability•
All facts areindependent•
Possible worldW:subset of facts•
What is theprobabilityof this possible world?0.03%Pr(W) = Y
F∈W
Pr(F)
!
× Y
F∈W/
1−Pr(F)
!
4/42
Queries
A central task in databases is toevaluate queries
•
Query:maps a graph (without probabilities) to YES/NO•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?→ e.g., ∃x y z x y z
∨ ∃x y z w x y z w
• Formally: afinite disjunctionof CQs
Queries
A central task in databases is toevaluate queries
•
Query:maps a graph (without probabilities) to YES/NO•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?→ e.g., ∃x y z x y z
∨ ∃x y z w x y z w
• Formally: afinite disjunctionof CQs
5/42
Queries
A central task in databases is toevaluate queries
•
Query:maps a graph (without probabilities) to YES/NO•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?→ e.g., ∃x y z x y z
∨ ∃x y z w x y z w
• Formally: afinite disjunctionof CQs
Queries
A central task in databases is toevaluate queries
•
Query:maps a graph (without probabilities) to YES/NO•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?→ e.g., ∃x y z x y z
∨ ∃x y z w x y z w
• Formally: afinite disjunctionof CQs
5/42
Queries
A central task in databases is toevaluate queries
•
Query:maps a graph (without probabilities) to YES/NO•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?→ e.g., ∃x y z x y z
∨ ∃x y z w x y z w
• Formally: afinite disjunctionof CQs
Queries
A central task in databases is toevaluate queries
•
Query:maps a graph (without probabilities) to YES/NO•
Conjunctive query(CQ): can I find a match of apattern?• e.g.,∃x y z x y z
→ We want ahomomorphismfrom the pattern to the graph (not necessarilyinjective)
→ Formally: anexistentially quantified conjunction of atoms (edges)
•
Union of conjunctive queries(UCQ): can I find a match ofsome pattern?→ e.g., ∃x y z x y z
∨ ∃x y z w x y z w
• Formally: afinite disjunctionof CQs
5/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40% 80%
100%
90% 90% 50% 90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(
1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)(1−80%)×(1−(
1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(
1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)(
1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−
(1−10%)×(1−40%)
)× (
1−(1−50%)×(1−90%))
)
×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)(1−10%)×(1−40%) 1−(1−50%)×(1−90%)) ×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−(1−10%)×(1−40%))× (1−(1−50%)×(1−90%))
)
×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)×(1−80%×(1−(1−90%)×(1−90%))), i.e.,97.65792%
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)•
Here we can do better (in PTIME):1−(1−80%)×(1−(1−(1−10%)×(1−40%))× (1−(1−50%)×(1−90%)))×(1−80%×(1−(1−90%)×(1−90%))),i.e.,97.65792%
6/42
Problem statement: Probabilistic query evaluation (PQE)
•
Wefixa queryQ, for instance the CQ: x y z•
Theinputis a TIDD: A.B.
İ.
Télécom Paris
Paris Sud
Technion
U. Oxford
ParisTech
IP Paris
Paris-Saclay
CESAER 80%
10%
40%
80%
100%
90%
90%
50%
90%
100%
•
Theoutputis thetotal probabilityof the worlds which satisfy the query:• Formally:P
W⊆D,W|=QPr(W)
→ Intuition:theprobabilitythat the query is true
•
We can always compute the probability in exponential time (go over all possibilities)Research goal: Understanding the complexity of PQE What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the same7/42
Research goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata
In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the sameResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the same7/42
Research goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the sameResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the same7/42
Research goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the sameResearch goal: Understanding the complexity of PQE
What is the complexity ofPQE(Q)depending on the queryQ?
→ Note that we studydata complexity, i.e.,Qisfixedand the input is thedata In this talk: several dichotomies on the PQE problem:
•
Existing results:• PQE(Q)is in #P for any UCQQand is#P-hardfor some CQs
• Dichotomyby Dalvi and Suciu:PQE(Q)for a UCQQis either#P-hardorPTIME
•
More general queries: dichotomy onhomomorphism-closed queries• PQE(Q)is#P-hardfor all homomorphism-closed queries not equivalent to a safe UCQ
•
Restricted instances:PQE(Q)for MSO queries...• Isin PTIMEif the input data is restricted to havebounded treewidth
• Isintractable otherwiseunder some assumptions
•
Restricted instances:if all probabilities are50%then the complexity is the same7/42
Table of contents
Introduction and problem statement Existing results
More general queries: Dichotomy on homomorphism-closed queries More restricted instances: Words, trees and bounded treewidth More restricted instances: Unweighted instances
Conclusion and open problems
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g., single-atom CQs
→ e.g., x y z
9/42
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g., single-atom CQs
→ e.g., x y z
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g., single-atom CQs
→ e.g., x y z
9/42
Basic complexity results
•
Whenever we canevaluateQin PTIME, thenPQE(Q)is in#P• #P:counting class of problems expressible as thenumber of accepting paths of a nondeterministic polynomial-time Turing Machine
→ Nondeterministically guess a possible world, then test the query
→ In particular,PQE(Q)is in#Pfor anyUCQQ
•
For some queriesQ, the taskPQE(Q)is inPTIME→ e.g., single-atom CQs
→ e.g., x y z
PQE is sometimes #P-hard
Let us show thatPQE(Q)is#P-hardfor the CQQ:
x y z w
•
Reduce from the problem ofcounting satisfying valuationsof a Boolean formula• e.g., given(x∨y)∧z, compute that it has3satisfying valuations
•
This problem is already#P-hardfor so-calledPP2DNF formulas:• Positive(no negation) andPartitioned variables:X1, . . . ,XnandY1, . . . ,Ym
• 2-DNF: disjunction of clauses likeXi∧Yj
•
Example:φ: (X1∧Y1)∨(X1∧Y2)∨(X2∧Y2)∨(X3∧Y1)∨(X3∧Y2)a01 a02 a03
a1 a2 a3 1/2 1/2 1/2
b1
b2
b01
b02 1/2
1/2 1
1 1
1 1
Idea: Satisfying valuationsofφcorrespond topossible worldswith amatchofQ
10/42
PQE is sometimes #P-hard
Let us show thatPQE(Q)is#P-hardfor the CQQ: x y z w
•
Reduce from the problem ofcounting satisfying valuationsof a Boolean formula• e.g., given(x∨y)∧z, compute that it has3satisfying valuations
•
This problem is already#P-hardfor so-calledPP2DNF formulas:• Positive(no negation) andPartitioned variables:X1, . . . ,XnandY1, . . . ,Ym
• 2-DNF: disjunction of clauses likeXi∧Yj
•
Example:φ: (X1∧Y1)∨(X1∧Y2)∨(X2∧Y2)∨(X3∧Y1)∨(X3∧Y2)a01 a02 a03
a1 a2 a3 1/2 1/2 1/2
b1
b2
b01
b02 1/2
1/2 1
1 1
1 1
Idea: Satisfying valuationsofφcorrespond topossible worldswith amatchofQ
PQE is sometimes #P-hard
Let us show thatPQE(Q)is#P-hardfor the CQQ: x y z w
•
Reduce from the problem ofcounting satisfying valuationsof a Boolean formula• e.g., given(x∨y)∧z, compute that it has3satisfying valuations
•
This problem is already#P-hardfor so-calledPP2DNF formulas:• Positive(no negation) andPartitioned variables:X1, . . . ,XnandY1, . . . ,Ym
• 2-DNF: disjunction of clauses likeXi∧Yj
•
Example:φ: (X1∧Y1)∨(X1∧Y2)∨(X2∧Y2)∨(X3∧Y1)∨(X3∧Y2)a01 a02 a03
a1 a2 a3 1/2 1/2 1/2
b1
b2
b01
b02 1/2
1/2 1
1 1
1 1
Idea: Satisfying valuationsofφcorrespond topossible worldswith amatchofQ
10/42
PQE is sometimes #P-hard
Let us show thatPQE(Q)is#P-hardfor the CQQ: x y z w
•
Reduce from the problem ofcounting satisfying valuationsof a Boolean formula• e.g., given(x∨y)∧z, compute that it has3satisfying valuations
•
This problem is already#P-hardfor so-calledPP2DNF formulas:• Positive(no negation) andPartitioned variables:X1, . . . ,XnandY1, . . . ,Ym
• 2-DNF: disjunction of clauses likeXi∧Yj
•
Example:φ: (X1∧Y1)∨(X1∧Y2)∨(X2∧Y2)∨(X3∧Y1)∨(X3∧Y2)a01 a02 a03
a1 a2 a3 1/2 1/2 1/2
b1
b2
b01
b02 1/2
1/2 1
1 1
1 1
Idea: Satisfying valuationsofφcorrespond topossible worldswith amatchofQ
PQE is sometimes #P-hard
Let us show thatPQE(Q)is#P-hardfor the CQQ: x y z w
•
Reduce from the problem ofcounting satisfying valuationsof a Boolean formula• e.g., given(x∨y)∧z, compute that it has3satisfying valuations
•
This problem is already#P-hardfor so-calledPP2DNF formulas:• Positive(no negation) andPartitioned variables:X1, . . . ,XnandY1, . . . ,Ym
• 2-DNF: disjunction of clauses likeXi∧Yj
•
Example:φ: (X1∧Y1)∨(X1∧Y2)∨(X2∧Y2)∨(X3∧Y1)∨(X3∧Y2)a01 a02 a03
a1 a2 a3 1/2 1/2 1/2
b1
b2
b01
b02 1/2
1/2 1
1 1
1 1
Idea: Satisfying valuationsofφcorrespond topossible worldswith amatchofQ
10/42