Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Probabilities and Provenance on Trees and Treelike Instances
Antoine Amarilli1, Pierre Bourhis2, Pierre Senellart1,3
1Télécom ParisTech
2CNRS CRIStAL
3National University of Singapore
February 8th, 2016
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation
Relational signatureσ
→ Example: R(arity 1),S(arity 2),T(arity 1)
Class I of instances
→ Example: all instances; acyclic instances; treelike instances; ... Fragment Q ofBoolean constant-free queries
→ Example: Boolean conjunctive queries
(= existentially quantified conjunction of atoms)
→ Example of CQ:q:∃x y R(x)∧S(x,y)∧T(y)
→ Query evaluationproblem for Q andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I
Determine whetherIsatisfiesq(writtenI|=q)
Complexity as afunction ofI, notq(=data complexity)
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation
Relational signatureσ
→ Example: R(arity 1),S(arity 2),T(arity 1) ClassI of instances
→ Example: all instances; acyclic instances; treelike instances; ...
Fragment Q ofBoolean constant-free queries
→ Example: Boolean conjunctive queries
(= existentially quantified conjunction of atoms)
→ Example of CQ:q:∃x y R(x)∧S(x,y)∧T(y)
→ Query evaluationproblem for Q andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I
Determine whetherIsatisfiesq(writtenI|=q)
Complexity as afunction ofI, notq(=data complexity)
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation
Relational signatureσ
→ Example: R(arity 1),S(arity 2),T(arity 1) ClassI of instances
→ Example: all instances; acyclic instances; treelike instances; ...
Fragment Q ofBoolean constant-free queries
→ Example: Boolean conjunctive queries
(= existentially quantified conjunction of atoms)
→ Example of CQ:q:∃x y R(x)∧S(x,y)∧T(y)
→ Query evaluationproblem for Q andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I
Determine whetherIsatisfiesq(writtenI|=q)
Complexity as afunction ofI, notq(=data complexity)
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation
Relational signatureσ
→ Example: R(arity 1),S(arity 2),T(arity 1) ClassI of instances
→ Example: all instances; acyclic instances; treelike instances; ...
Fragment Q ofBoolean constant-free queries
→ Example: Boolean conjunctive queries
(= existentially quantified conjunction of atoms)
→ Example of CQ:q:∃x y R(x)∧S(x,y)∧T(y)
→ Query evaluationproblem for Q andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I
Determine whetherIsatisfiesq(writtenI|=q)
Complexity as afunction ofI, notq(=data complexity)
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation example
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R a b c
S a a b v
b w
T v w b
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation example
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R a b c
S a a b v
b w
T v w b
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation example
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R a b c
S a a b v
b w
T v w b
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation example
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R a b c
S a a b v
b w
T v w b
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation example
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R a b c
S a a b v
b w
T v w b
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Query evaluation example
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃xyR(x)∧S(x,y)∧T(y)
R a b c
S a a b v
b w
T v w b
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Probabilistic query evaluation
Signatureσ, classQ of queries, classI of instances.
→ Probabilistic query evaluation problem forQ andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I
Andgiven a probability valuationπ mapping facts ofIto probabilities in[0,1] Compute theprobabilitythatI|=q
Data complexity: measured as a function ofIandπ Semantics: (I, π) gives a probability distribution onI′ ⊆I:
Each factF∈Iis eitherpresentorabsentwith probabilityπ(F) Facts areindependent
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Probabilistic query evaluation
Signatureσ, classQ of queries, classI of instances.
→ Probabilistic query evaluation problem forQ andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I Andgiven a probability valuationπ mapping facts ofIto probabilities in[0,1]
Compute theprobabilitythatI|=q
Data complexity: measured as a function ofIandπ Semantics: (I, π) gives a probability distribution onI′ ⊆I:
Each factF∈Iis eitherpresentorabsentwith probabilityπ(F) Facts areindependent
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Probabilistic query evaluation
Signatureσ, classQ of queries, classI of instances.
→ Probabilistic query evaluation problem forQ andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I Andgiven a probability valuationπ mapping facts ofIto probabilities in[0,1]
Compute theprobabilitythatI|=q
Data complexity: measured as a function ofIandπ Semantics: (I, π) gives a probability distribution onI′ ⊆I:
Each factF∈Iis eitherpresentorabsentwith probabilityπ(F) Facts areindependent
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Probabilistic query evaluation
Signatureσ, classQ of queries, classI of instances.
→ Probabilistic query evaluation problem forQ andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I Andgiven a probability valuationπ mapping facts ofIto probabilities in[0,1]
Compute theprobabilitythatI|=q
Data complexity: measured as a function ofIandπ
Semantics: (I, π) gives a probability distribution onI′ ⊆I: Each factF∈Iis eitherpresentorabsentwith probabilityπ(F) Facts areindependent
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Probabilistic query evaluation
Signatureσ, classQ of queries, classI of instances.
→ Probabilistic query evaluation problem forQ andI: Fix aqueryq∈ Q
Given an inputinstanceI∈ I Andgiven a probability valuationπ mapping facts ofIto probabilities in[0,1]
Compute theprobabilitythatI|=q
Data complexity: measured as a function ofIandπ Semantics: (I, π) gives a probability distribution onI′ ⊆I:
Each factF∈Iis eitherpresentorabsentwith probabilityπ(F)
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of a probabilistic instance
S
a a 1
b v .5 b w .2
This(I, π) represents the followingprobability distribution: .5×.2
S
a a
b v
b w
.5×(1−.2) S
a a
b v
(1−.5)×.2 S
a a
b w
(1−.5)×(1−.2) S
a a
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of a probabilistic instance
S
a a 1
b v .5 b w .2
This(I, π) represents the followingprobability distribution:
.5×.2 S
a a
b v
b w
.5×(1−.2) S
a a
b v
(1−.5)×.2 S
a a
b w
(1−.5)×(1−.2) S
a a
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of a probabilistic instance
S
a a 1
b v .5 b w .2
This(I, π) represents the followingprobability distribution:
.5×.2 S
a a
b v
b w
.5×(1−.2) S
a a
b v
(1−.5)×.2 S
a a
b w
(1−.5)×(1−.2) S
a a
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of a probabilistic instance
S
a a 1
b v .5 b w .2
This(I, π) represents the followingprobability distribution:
.5×.2 S
.5×(1−.2) S
(1−.5)×.2 S
a a
b w
(1−.5)×(1−.2) S
a a
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of a probabilistic instance
S
a a 1
b v .5 b w .2
This(I, π) represents the followingprobability distribution:
.5×.2 S
a a
b v
b w
.5×(1−.2) S
a a
b v
(1−.5)×.2 S
a a
b w
(1−.5)×(1−.2) S
a a
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of a probabilistic instance
S
a a 1
b v .5 b w .2
This(I, π) represents the followingprobability distribution:
.5×.2 S
.5×(1−.2) S
(1−.5)×.2 S
(1−.5)×(1−.2) S
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃xyR(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃xyR(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of: S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
S(b,v)andT(v)are here S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
S(b,v)andT(v)are here
S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
S(b,v)andT(v)are here S(b,w)andT(w)are here
→ Probability:
.4×(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
(1−(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
S(b,v)andT(v)are here S(b,w)andT(w)are here
→ Probability: .4×(1−
(1−.5×.3)×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
×(1−.2×.7)) =.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
S(b,v)andT(v)are here S(b,w)andT(w)are here
→ Probability: .4×(1−(1−.5×.3)×(1−.2×.7))
=.1076
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Example of probabilistic query evaluation
Signatureσ, class Qofconjunctive queries, classI of all instances.
q:∃x y R(x)∧S(x,y)∧T(y)
R
a 1
b .4 c .6
S
a a 1
b v .5 b w .2
T v .3 w .7
b 1
The query is true iff R(b) is here and one of:
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012] Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Qofsafe queries
PQE isPTIMEfor anyq∈ S on all instances PQE is#P-hardfor anyq∈ Q\S on all instances q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ? Probabilistic XML: [Cohen et al., 2009]
Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012]
Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Q ofsafe queries
PQE isPTIMEfor anyq∈ S on all instances PQE is#P-hardfor anyq∈ Q\S on all instances q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ? Probabilistic XML: [Cohen et al., 2009]
Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012]
Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Q ofsafe queries
PQE isPTIMEfor anyq∈ S on all instances
PQE is#P-hardfor anyq∈ Q\S on all instances q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ? Probabilistic XML: [Cohen et al., 2009]
Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012]
Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Q ofsafe queries
PQE isPTIMEfor anyq∈ S on all instances PQE is#P-hardfor anyq∈ Q\S on all instances
q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ? Probabilistic XML: [Cohen et al., 2009]
Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012]
Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Q ofsafe queries
PQE isPTIMEfor anyq∈ S on all instances PQE is#P-hardfor anyq∈ Q\S on all instances q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ? Probabilistic XML: [Cohen et al., 2009]
Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012]
Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Q ofsafe queries
PQE isPTIMEfor anyq∈ S on all instances PQE is#P-hardfor anyq∈ Q\S on all instances q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ?
Probabilistic XML: [Cohen et al., 2009] Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Complexity of probabilistic query evaluation (PQE)
Question: what is thecomplexityof probabilistic query evaluation depending on the classQof queries and classI ofinstances?
Existing dichotomy result: [Dalvi and Suciu, 2012]
Qare (unions of) conjunctive queries,I is all instances There is a classS ⊆ Q ofsafe queries
PQE isPTIMEfor anyq∈ S on all instances PQE is#P-hardfor anyq∈ Q\S on all instances q:∃x y R(x)∧S(x,y)∧T(y)isunsafe!
Is there asmaller classI such that PQE is tractable for a largerQ? Probabilistic XML: [Cohen et al., 2009]
Qaretree automata,I aretrees PQE isPTIME
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Trees and treelike instances
Goal: find an instance class I where PQE is tractable
Idea: takeI to betreelike instances
Treelike: thetreewidthis bounded by aconstant Treeshave treewidth1
Cycleshave treewidth2
k-cliquesand(k−1)-gridshave treewidthk−1
→ For non-probabilistic query evaluation [Courcelle, 1990]: I: treelikeinstances; Q: monadic second-order(MSO) queries
→ non-probabilistic QE is inlinear time
→ Does this extend to probabilistic QE?
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Trees and treelike instances
Goal: find an instance class I where PQE is tractable Idea: takeI to betreelike instances
Treelike: thetreewidth is bounded by aconstant
Treeshave treewidth1 Cycleshave treewidth2
k-cliquesand(k−1)-gridshave treewidthk−1
→ For non-probabilistic query evaluation [Courcelle, 1990]: I: treelikeinstances; Q: monadic second-order(MSO) queries
→ non-probabilistic QE is inlinear time
→ Does this extend to probabilistic QE?
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Trees and treelike instances
Goal: find an instance class I where PQE is tractable Idea: takeI to betreelike instances
Treelike: thetreewidth is bounded by aconstant Treeshave treewidth1
Cycleshave treewidth2
k-cliquesand(k−1)-gridshave treewidthk−1
→ For non-probabilistic query evaluation [Courcelle, 1990]: I: treelikeinstances; Q: monadic second-order(MSO) queries
→ non-probabilistic QE is inlinear time
→ Does this extend to probabilistic QE?
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Trees and treelike instances
Goal: find an instance class I where PQE is tractable Idea: takeI to betreelike instances
Treelike: thetreewidth is bounded by aconstant Treeshave treewidth1
Cycleshave treewidth2
k-cliquesand(k−1)-gridshave treewidthk−1
→ For non-probabilistic query evaluation [Courcelle, 1990]:
I: treelikeinstances; Q: monadic second-order(MSO) queries
→ non-probabilistic QE is inlinear time
→ Does this extend to probabilistic QE?
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Trees and treelike instances
Goal: find an instance class I where PQE is tractable Idea: takeI to betreelike instances
Treelike: thetreewidth is bounded by aconstant Treeshave treewidth1
Cycleshave treewidth2
k-cliquesand(k−1)-gridshave treewidthk−1
→ For non-probabilistic query evaluation [Courcelle, 1990]:
I: treelikeinstances; Q: monadic second-order(MSO) queries
→ non-probabilistic QE is inlinear time
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Our results
Aninstance-based dichotomy result:
Upper bound. ForI thetreelikeinstances and QtheMSO queries
→ PQE is in linear timemodulo arithmetic costs
Also for expressiveprovenance representations Also with bounded-treewidthcorrelations Lower bound. For anyunbounded-tw family I andQ FO queries
→ PQE is#P-hard under RP reductionsassuming Signaturearity is 2(graphs)
High-tw instances inI areeasily constructible
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Our results
Aninstance-based dichotomy result:
Upper bound. ForI thetreelikeinstances and QtheMSO queries
→ PQE is in linear timemodulo arithmetic costs Also for expressive provenance representations Also with bounded-treewidth correlations
Lower bound. For anyunbounded-tw family I andQ FO queries
→ PQE is#P-hard under RP reductionsassuming Signaturearity is 2(graphs)
High-tw instances inI areeasily constructible
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Our results
Aninstance-based dichotomy result:
Upper bound. ForI thetreelikeinstances and QtheMSO queries
→ PQE is in linear timemodulo arithmetic costs Also for expressive provenance representations Also with bounded-treewidth correlations Lower bound. For anyunbounded-tw family I andQ FO queries
→ PQE is #P-hard under RP reductionsassuming Signaturearity is 2(graphs)
High-tw instances inI areeasily constructible
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Table of contents
1 Introduction
2 Upper bounds
3 Semiring provenance
4 Correlations
5 Lower bounds
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1} we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b
b c
d e
e d
f f
→ Provenance: (f1∧f2)∨(f3∧f4)∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q
Example query: ∃x y z R(x,y)∧R(y,z) R
a b
b c
d e
e d
f f
→ Provenance: (f1∧f2)∨(f3∧f4)∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b f1
b c f2
d e f3
e d f4
f f f5
→ Provenance: (f1∧f2)∨(f3∧f4)∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b f1
b c f2
d e f3
e d f
→ Provenance: (f1∧f2)∨(f3∧f4)∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R a b f1 b c f2
d e f3
e d f4
f f f5
→ Provenance: (f1∧f2)
∨(f3∧f4)∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b f1
b c f2
d e f3
e d f
∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b f1
b c f2
d e f3 e d f4
f f f5
→ Provenance: (f1∧f2)∨(f3∧f4)
∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b f1
b c f2
d e f3
e d f
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
Technical tool: provenance
Theprovenanceof a query q on an instance I:
Boolean function ϕwhose variables are the facts ofI A subinstance ofIsatisfiesq iff ϕis true for that valuation
→ For allν:I→ {0,1}we haveν(ϕ) = 1 iff{F∈I|ν(F) = 1} |=q Example query: ∃x y z R(x,y)∧R(y,z)
R
a b f1
b c f2
d e f3
e d f4
f f f5
→ Provenance: (f1∧f2)∨(f3∧f4)∨f5
Introduction Upper bounds Semiring provenance Correlations Lower bounds Conclusion
General roadmap
Useprovenance for probabilistic query evaluation:
Compute a provenance representationefficiently
→ Probability of theprovenance= probability of the query
Compute the provenance probabilityefficiently (show it is not#P-hard as in the general case)
To solve the PQE problem on treelike instancesfor MSO First solve the problem ontreeswithtree automata Then use the results of [Courcelle, 1990]