Tractable Query Answering Under Probabilistic Constraints
Antoine Amarilli1, Pierre Bourhis2, Pierre Senellart1,3
1Télécom ParisTech
2CNRS-LIFL
3National University of Singapore
September 4th, 2014
Tractable Query Evaluation On Probabilistic Instances
Antoine Amarilli1, Pierre Bourhis2, Pierre Senellart1,3
1Télécom ParisTech
2CNRS-LIFL
3National University of Singapore
September 4th, 2014
Background Ideas Results Consequences
Instances and queries
Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}
Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)
→ Query evaluation(model checking) of q onI
→ Data complexity: qis fixed
Background Ideas Results Consequences
Instances and queries
Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}
Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)
→ Query evaluation(model checking) of q onI
→ Data complexity: qis fixed
Background Ideas Results Consequences
Instances and queries
Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}
Given a conjunctive query(CQ) (existentially quantified) q:∃xyR(x,y)∧S(y)
→ Query evaluation(model checking) of q onI
→ Data complexity: qis fixed
Background Ideas Results Consequences
Instances and queries
Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}
Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)
→ Query evaluation(model checking) of q onI
→ Data complexity: qis fixed
Background Ideas Results Consequences
Instances and queries
Given a relational instance(= set of facts, hypergraph) I={R(a,b),R(b,c),S(c)}
Given a conjunctive query(CQ) (existentially quantified) q:∃xy R(x,y)∧S(y)
→ Query evaluation(model checking) of q onI
→ Data complexity: q is fixed
Background Ideas Results Consequences
Uncertain and probabilistic instances
Set of uncertain events
eflight CDG → VIE flight AF1756 takes place ebus Vienna → Bratislava buses are running
Annotate instance factswith formulaeon the events IsIn(AA, Paris) ¬eflight
IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus
→ Semantics: aset of instances (possible worlds). Add aprobability distribution on each event
each event hasprobability0<p<1 of being true all events are assumed to beindependent
→ Semantics: aprobability distribution on instances.
→ Query evaluation: determine theprobability of q onbI.
Background Ideas Results Consequences
Uncertain and probabilistic instances
Set of uncertain events
eflight CDG → VIE flight AF1756 takes place ebus Vienna → Bratislava buses are running Annotate instance factswith formulaeon the events
IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus
→ Semantics: aset of instances (possible worlds). Add aprobability distribution on each event
each event hasprobability0<p<1 of being true all events are assumed to beindependent
→ Semantics: aprobability distribution on instances.
→ Query evaluation: determine theprobability of q onbI.
Background Ideas Results Consequences
Uncertain and probabilistic instances
Set of uncertain events
eflight CDG → VIE flight AF1756 takes place ebus Vienna → Bratislava buses are running Annotate instance factswith formulaeon the events
IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus
→ Semantics: aset of instances (possible worlds).
Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent
→ Semantics: aprobability distribution on instances.
→ Query evaluation: determine theprobability of q onbI.
Background Ideas Results Consequences
Uncertain and probabilistic instances
Set of uncertain events
eflight CDG → VIE flight AF1756 takes place ebus Vienna → Bratislava buses are running Annotate instance factswith formulaeon the events
IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus
→ Semantics: aset of instances (possible worlds).
Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent
→ Semantics: aprobability distribution on instances.
→ Query evaluation: determine theprobability of q onbI.
Background Ideas Results Consequences
Uncertain and probabilistic instances
Set of uncertain events
eflight CDG → VIE flight AF1756 takes place ebus Vienna → Bratislava buses are running Annotate instance factswith formulaeon the events
IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus
→ Semantics: aset of instances (possible worlds).
Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent
→ Semantics: aprobability distribution on instances.
→ Query evaluation: determine theprobability of q onbI.
Background Ideas Results Consequences
Uncertain and probabilistic instances
Set of uncertain events
eflight CDG → VIE flight AF1756 takes place ebus Vienna → Bratislava buses are running Annotate instance factswith formulaeon the events
IsIn(AA, Paris) ¬eflight IsIn(AA, Vienna) eflight∧ ¬ebus IsIn(AA, Bratislava) eflight∧ebus
→ Semantics: aset of instances (possible worlds).
Add aprobability distribution on each event each event hasprobability0<p<1 of being true all events are assumed to beindependent
→ Semantics: aprobability distribution on instances.
→ Query evaluation: determine theprobability of q onbI.
Background Ideas Results Consequences
Hardness and tractability
Witharbitrary annotations
→ Query evaluation is#P-hardeven with a single fact (Immediate reduction from #SAT)
Withsimple annotations (one unique event per tuple)
→ Query evaluation is#P-hardon arbitrary instances (Use the instance to do the reduction)
Existing work:
→ Fix asimpleannotation scheme
→ Showdichotomybetween #P-hard and PTIME queries Our approach:
→ Find arestrictionon the instance and annotations
→ Show thatmany queriesare tractable in this case
Background Ideas Results Consequences
Hardness and tractability
Witharbitrary annotations
→ Query evaluation is#P-hardeven with a single fact (Immediate reduction from #SAT)
Withsimple annotations (one unique event per tuple)
→ Query evaluation is#P-hardon arbitrary instances (Use the instance to do the reduction)
Existing work:
→ Fix asimpleannotation scheme
→ Showdichotomybetween #P-hard and PTIME queries
Our approach:
→ Find arestrictionon the instance and annotations
→ Show thatmany queriesare tractable in this case
Background Ideas Results Consequences
Hardness and tractability
Witharbitrary annotations
→ Query evaluation is#P-hardeven with a single fact (Immediate reduction from #SAT)
Withsimple annotations (one unique event per tuple)
→ Query evaluation is#P-hardon arbitrary instances (Use the instance to do the reduction)
Existing work:
→ Fix asimpleannotation scheme
→ Showdichotomybetween #P-hard and PTIME queries Our approach:
→ Find arestrictionon the instance and annotations
→ Show thatmany queriesare tractable in this case
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c)
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c) tree encoding TI tree decomposition
O(|I|) for fixed width instance I
R(a, b) R(b, c) S(c)
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c) tree encoding TI tree decomposition
O(|I|) for fixed width instance I
R(a, b) R(b, c) S(c)
query q
∃xy R(x, y) ∧ S(y)
tree encoding TI tree decomposition
O(|I|) for fixed width
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c) tree encoding TI tree decomposition
O(|I|) for fixed width instance I
R(a, b) R(b, c) S(c)
query q
∃xy R(x, y) ∧ S(y)
tree encoding TI tree decomposition
O(|I|) for fixed width
deterministic tree automaton Aq
rewriting O(1) data complexity
query q
∃xy R(x, y) ∧ S(y)
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c) tree encoding TI tree decomposition
O(|I|) for fixed width instance I
R(a, b) R(b, c) S(c)
query q
∃xy R(x, y) ∧ S(y)
tree encoding TI tree decomposition
O(|I|) for fixed width
deterministic tree automaton Aq
rewriting O(1) data complexity
query q
∃xy R(x, y) ∧ S(y) deterministic tree automaton Aq
rewriting O(1) data complexity
evaluation linear time
query answer
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c) tree encoding TI tree decomposition
O(|I|) for fixed width instance I
R(a, b) R(b, c) S(c)
query q
∃xy R(x, y) ∧ S(y)
tree encoding TI tree decomposition
O(|I|) for fixed width
deterministic tree automaton Aq
rewriting O(1) data complexity
query q
∃xy R(x, y) ∧ S(y) deterministic tree automaton Aq
rewriting O(1) data complexity
evaluation linear time
query answer evaluation
linear time
query answer
→ Linear time data complexity
Background Ideas Results Consequences
Bounded treewidth
An idea from instances without probabilities...
If an instance has low treewidth then it is almost a tree Assume that the instance treewidth isconstant...
instance I
R(a, b) R(b, c) S(c) tree encoding TI tree decomposition
O(|I|) for fixed width instance I
R(a, b) R(b, c) S(c)
query q
∃xy R(x, y) ∧ S(y)
tree encoding TI tree decomposition
O(|I|) for fixed width
deterministic tree automaton Aq
rewriting O(1) data complexity
query q
∃xy R(x, y) ∧ S(y) deterministic tree automaton Aq
rewriting O(1) data complexity
evaluation linear time
query answer evaluation
linear time
query answer
→ Linear time data complexity
Background Ideas Results Consequences
Tractable inference
An idea from probabilities without instances...
Represent a propositional formulaF as a Boolean circuit Assume the circuit has constant treewidth
→ Probability of F can be computed in linear time (usingjunction tree algorithm for Bayesian networks) (assuming constant-time arithmetic operations)
Background Ideas Results Consequences
cc-tables
Boolean circuit for the annotations
1/2 1/2
1/2
∧
∧
R(a, b) R(b, c)
R(c, d) Circuitmust have low treewidth Instance must have low treewidth
→ Needsimultaneous decomposition
Background Ideas Results Consequences
cc-tables
Boolean circuit for the annotations 1/2
1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
Circuitmust have low treewidth Instance must have low treewidth
→ Needsimultaneous decomposition
Background Ideas Results Consequences
cc-tables
Boolean circuit for the annotations 1/2
1/2 1/2
∧
∧
R(a, b) R(b, c)
R(c, d) Circuitmust have low treewidth Instance must have low treewidth
→ Needsimultaneous decomposition
Background Ideas Results Consequences
cc-tables
Boolean circuit for the annotations 1/2
1/2 1/2
∧
∧
R(a, b) R(b, c)
R(c, d) Circuitmust have low treewidth Instance must have low treewidth
→ Needsimultaneous decomposition
1/2 R(a, b) 1/2
R(b, c)
∧
1/2
R(c, d)
∧
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
query q
∃xy R(x, y) ∧ S(y)
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
query q
∃xy R(x, y) ∧ S(y)
query q
∃xy R(x, y) ∧ S(y)
deterministic tree automaton Aq rewriting
O(1) data complexity
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
query q
∃xy R(x, y) ∧ S(y)
query q
∃xy R(x, y) ∧ S(y)
deterministic tree automaton Aq rewriting
O(1) data complexity
deterministic tree automaton Aq rewriting
O(1) data complexity
instrumentation linear time
bounded treewidth circuit C
1/2
1/2
∧
1/2
∧ A A
A
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
query q
∃xy R(x, y) ∧ S(y)
query q
∃xy R(x, y) ∧ S(y)
deterministic tree automaton Aq rewriting
O(1) data complexity
deterministic tree automaton Aq rewriting
O(1) data complexity
instrumentation linear time
bounded treewidth circuit C
1/2
1/2
∧
1/2
∧ A A
A
instrumentation linear time
bounded treewidth circuit C
1/2
1/2
∧
1/2
∧ A A
A
probability p probabilistic inference O(|C|) for fixed width
0.42
Background Ideas Results Consequences
Main result
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
instance I
1/2 1/2 1/2
∧
∧
R(a, b) R(b, c) R(c, d)
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
tree encoding TI
tree decomposition O(|I|) for fixed width
1/2 R(a, b) 1/2
R(b, c)
∧ 1/2
R(c, d)
∧
query q
∃xy R(x, y) ∧ S(y)
query q
∃xy R(x, y) ∧ S(y)
deterministic tree automaton Aq rewriting
O(1) data complexity
deterministic tree automaton Aq rewriting
O(1) data complexity
instrumentation linear time
bounded treewidth circuit C
1/2
1/2
∧
1/2
∧ A A
A
instrumentation linear time
bounded treewidth circuit C
1/2
1/2
∧
1/2
∧ A A
A
probability p probabilistic inference O(|C|) for fixed width
0.42
probability p probabilistic inference O(|C|) for fixed width
0.42
Background Ideas Results Consequences
Consequences
For queries representable as deterministic automata...
→ CQs
→ Monadic second-order
→ Guarded second-order
... on various probabilistic models...
→ Tuple-independent tables
→ Block-independent disjoint tables
→ pc-tables (presented before)
→ Probabilistic XML
... assuming bounded treewidth(for reasonable definitions) ...
→ ... probability of fixed q can be computed in O(|bI|)!
Background Ideas Results Consequences
Consequences
For queries representable as deterministic automata...
→ CQs
→ Monadic second-order
→ Guarded second-order
... on various probabilistic models...
→ Tuple-independent tables
→ Block-independent disjoint tables
→ pc-tables (presented before)
→ Probabilistic XML
... assuming bounded treewidth(for reasonable definitions) ...
→ ... probability of fixed q can be computed in O(|bI|)!
Background Ideas Results Consequences
Consequences
For queries representable as deterministic automata...
→ CQs
→ Monadic second-order
→ Guarded second-order
... on various probabilistic models...
→ Tuple-independent tables
→ Block-independent disjoint tables
→ pc-tables (presented before)
→ Probabilistic XML
... assuming bounded treewidth(for reasonable definitions) ...
→ ... probability of fixed q can be computed in O(|bI|)!
Background Ideas Results Consequences
Consequences
For queries representable as deterministic automata...
→ CQs
→ Monadic second-order
→ Guarded second-order
... on various probabilistic models...
→ Tuple-independent tables
→ Block-independent disjoint tables
→ pc-tables (presented before)
→ Probabilistic XML
... assuming bounded treewidth(for reasonable definitions) ...
→ ... probability of fixed q can be computed in O(|bI|)!
Background Ideas Results Consequences
Conclusion
We can combinethe following techniques:
Computingtree decompositions
Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits
Applications:
Tractable probabilistic query evaluation inpractice? Reasoning underuncertain rules
(hence the bait-and-switch on the title...) Questions:
Othersemiringsthan Boolean AND/OR? Other tasks thanprobabilistic inference? 5
votes
0
answers
48
views
What are bounded-treewidth circuits good for?
circuit-complexity pr.probability treewidth arithmetic-circuits
modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624
Thanks for your attention!
Background Ideas Results Consequences
Conclusion
We can combinethe following techniques:
Computingtree decompositions
Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits
Applications:
Tractable probabilistic query evaluation inpractice?
Reasoning underuncertain rules
(hence the bait-and-switch on the title...)
Questions:
Othersemiringsthan Boolean AND/OR? Other tasks thanprobabilistic inference? 5
votes
0
answers
48
views
What are bounded-treewidth circuits good for?
circuit-complexity pr.probability treewidth arithmetic-circuits
modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624
Thanks for your attention!
Background Ideas Results Consequences
Conclusion
We can combinethe following techniques:
Computingtree decompositions
Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits
Applications:
Tractable probabilistic query evaluation inpractice?
Reasoning underuncertain rules
(hence the bait-and-switch on the title...) Questions:
Othersemiringsthan Boolean AND/OR?
Other tasks thanprobabilistic inference?
5
votes
0
answers
48
views
What are bounded-treewidth circuits good for?
circuit-complexity pr.probability treewidth arithmetic-circuits
modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624
Thanks for your attention!
Background Ideas Results Consequences
Conclusion
We can combinethe following techniques:
Computingtree decompositions
Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits
Applications:
Tractable probabilistic query evaluation inpractice?
Reasoning underuncertain rules
(hence the bait-and-switch on the title...) Questions:
Othersemiringsthan Boolean AND/OR?
Other tasks thanprobabilistic inference?
5
votes
0
answers
48
views
What are bounded-treewidth circuits good for?
circuit-complexity pr.probability treewidth arithmetic-circuits
modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624
Thanks for your attention!
Background Ideas Results Consequences
Conclusion
We can combinethe following techniques:
Computingtree decompositions
Encoding problems toautomataon tree encodings of instances Evaluatingprobabilitieson bounded-treewidth circuits
Applications:
Tractable probabilistic query evaluation inpractice?
Reasoning underuncertain rules
(hence the bait-and-switch on the title...) Questions:
Othersemiringsthan Boolean AND/OR?
Other tasks thanprobabilistic inference?
5
votes
0
answers
48
views
What are bounded-treewidth circuits good for?
circuit-complexity pr.probability treewidth arithmetic-circuits
modified aug 28 at 13:05a3nm1,432 http://cstheory.stackexchange.com/q/25624
Thanks for your attention!