Leveraging the structure of uncertain data
Antoine Amarilli May 26, 2017
Example application: Subway routing
Example application: Subway routing
Example application: Subway routing
Example application: Subway routing
Example application: Subway routing
Database theory and query evaluation
13 6
23 14
Database
• (Hyper)graph
• Collection of ground facts
G(aa1, ab2), G(ab2, ac3), S(aa1, m4), S(ab2, rB), ...
Database theory and query evaluation
13 6
23 14
Query
?
+
Database
• (Hyper)graph
• Collection of ground facts
G(aa ab G(ab ac
• Regular path
∀X(rm ∈ X ∧ ∀xy (Metro|RER)*
• Logic formula
|(Bus|Tram)*
Database theory and query evaluation
13 6
23 14
Query
?
+
Database
• (Hyper)graph
• Collection of ground facts
G(aa1, ab2), G(ab2, ac3), S(aa1, m4), S(ab2, rB), ...
• Regular path
∀X(rm ∈ X ∧ ∀xy
i Result
• TRUE/FALSE
↔ Model checking (Metro|RER)*
• Logic formula
(x ∈ X ∧ G(x, y) → y ∈ X)) → gn ∈ X
|(Bus|Tram)*
Probabilistic query evaluation
13 6
23 14
Probabilistic query evaluation
Probabilistic query evaluation
Probabilistic query evaluation
Probabilistic query evaluation
Probabilistic query evaluation
Probabilistic query evaluation
Probabilistic query evaluation
13 6
23 14
Probabilistic query evaluation
13 6
23 95% 14
5% 60
98%
2% 60
Probabilistic query evaluation
• (Hyper)graph
• Collection of ground facts
Probabilistic database
+ independent probabilities
13 6
23 95%14
5% 60
98%
2% 60
Probabilistic query evaluation
Query
?
+
• (Hyper)graph
• Collection of ground facts
• Regular path
∀X(rm ∈ X ∧ ∀xy (Metro|RER)*
• Logic formula
|(Bus|Tram)*
Probabilistic database
+ independent
13 6
23 95%14
5% 60
98%
2% 60
Probabilistic query evaluation
Query
?
+
• (Hyper)graph
• Collection of ground facts
• Regular path
∀X(rm ∈ X ∧ ∀xy (Metro|RER)*
• Logic formula
(x ∈ X ∧ G(x, y) → y ∈ X)) → gn ∈ X
|(Bus|Tram)*
Probabilistic database
+ independent probabilities
Probabilistic i
Result
• Probability according to the input distribution
13 6
23 95%14
5% 60
98%
2% 60
proba to be on time: 98%
Computational complexity
•
Computing paths on a large graph:→ #P-hardcomputational complexity in thedatabase
Computational complexity
50%
90%
42%
37%
90%
83%
78%
72%
•
Computing paths on a largeprobabilisticgraph:→ ???
→ #P-hardcomputational complexity in thedatabase
Computational complexity
50%
90%
42%
37%
90%
83%
78%
72%
•
Computing paths on a largeprobabilisticgraph:→ #P-hardcomputational complexity in thedatabase
Computational complexity
50%
90%
42%
37%
90%
83%
78%
72%
•
Computing paths on a largeprobabilisticgraph:→ Exponentialnumber of possibilities
Idea: use the structure of data
→ Shortest path:very easyon alarge tree
Idea: use the structure of data
→ Shortest path:very easyon alarge tree
Idea: use the structure of data
→ Shortest path:very easyon alarge tree
Idea: use the structure of data
→ Shortest path:very easyon alarge tree
Idea: use the structure of data
→ Shortest path:very easyon alarge tree
Idea: use the structure of data
→ Shortest path:very easyon alarge tree
Idea: use the structure of data
Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Leveraging the structure of uncertain data
Does query evaluation on probabilistic data havelower complexity when thestructureof the data isclose to a tree?
In this talk:
•
Existing results onnon-probabilistic data:• Tree automata, to evaluate queries on trees
• Treewidth, formalizes the notion of being “close to a tree”
• Courcelle’s theorem
•
Introducenew toolsandresults:• Provenance circuitsof tree automata on uncertain trees
• Applications toprobabilistic query evaluation
•
Other applications:Counting, enumeration, provenance...Table of contents
Introduction Existing tools
Provenance circuits and probabilistic query evaluation Other applications
Non-probabilistic query evaluation on trees
Database: atreeT where nodes have a color from an alphabet
?
QueryQ: asentencein monadic second-order logic (MSO)•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeT satisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Non-probabilistic query evaluation on trees
Database: atreeT where nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeT satisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Non-probabilistic query evaluation on trees
Database: atreeT where nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeT satisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Non-probabilistic query evaluation on trees
Database: atreeT where nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeT satisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Tree automata Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata Tree alphabet:
B
⊥ P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata Tree alphabet:
B
⊥ P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥
Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata Tree alphabet:
B
B
⊥ P
P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥
Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata Tree alphabet:
>
B
B
⊥ P
P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥
Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata Tree alphabet:
>
B
B
⊥ P
P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees
Corollary
Non-probabilistic query evaluation of MSO on trees is inlinear time.
Tree automata Tree alphabet:
>
B
B
⊥ P
P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states:{>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥ Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees Corollary
Non-probabilistic query evaluation on trees
Database: atreeT where nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating ifTsatisfies the queryQComputational complexityas a function of thetreeT (the queryQisfixed)
Non-probabilistic query evaluation on treelike data
Database: atreelike databaseT
???
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
(Metro|RER)∗
|(Bus|Tram)∗
i
Result: TRUE/FALSE indicating ifTsatisfies the queryQComputational complexityas a function of thetreeT (the queryQisfixed)
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliquesand(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliques and(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Treewidth
Treewidth by example:
•
Treeshave treewidth1•
Cycleshave treewidth2•
k-cliques and(k−1)-gridshave treewidthk−1→ Treelike: thetreewidthis bounded by aconstant
Courcelle’s theorem
(RER|metro)*
|(bus|tram)*
MSO query Treelike data
Theorem [Courcelle, 1990]
For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidth≤k,
we can compute inlinear timeinDwhetherDsatisfiesQ
Courcelle’s theorem
Tree automaton (RER|metro)*
|(bus|tram)*
MSO query Treelike data
Theorem [Courcelle, 1990]
For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidth≤k,
we can compute inlinear timeinDwhetherDsatisfiesQ
Courcelle’s theorem
Tree automaton Tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Treelike data
linear
Theorem [Courcelle, 1990]
For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidth≤k,
we can compute inlinear timeinDwhetherDsatisfiesQ
Courcelle’s theorem
Tree automaton Tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Treelike data
Query answer
TRUE
linear linear
Theorem [Courcelle, 1990]
For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidth≤k,
we can compute inlinear timeinDwhetherDsatisfiesQ
Courcelle’s theorem
Tree automaton Tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Treelike data
Query answer
TRUE
linear linear
Theorem [Courcelle, 1990]
For any fixed Boolean MSO queryQandk∈N, given a databaseDof treewidth≤k,
Table of contents
Introduction Existing tools
Provenance circuits and probabilistic query evaluation Other applications
Probabilistic query evaluation on treelike data
•
DatabaseDwithtreewidth≤kfor some constantk
•
Probabilityof each fact ofD to be actually present in the data (independently from other facts)?
QueryQ: asentencein monadicsecond-order logic (MSO)
(Metro|RER)∗
|(Bus|Tram)∗
i
Result: Probabilitythat the databaseDsatisfies queryQComputational complexityas a function of thedatabaseD (the queryQisfixed)
Probabilistic query evaluation on treelike data
•
DatabaseDwithtreewidth≤kfor some constantk
•
Probabilityof each fact ofD to be actually present in the data (independently from other facts)?
QueryQ: asentencein monadicsecond-order logic (MSO)
(Metro|RER)∗
|(Bus|Tram)∗
i
Result: Probabilitythat the databaseDsatisfies queryQComputational complexityas a function of thedatabaseD (the queryQisfixed)
Probabilistic query evaluation on treelike data
•
DatabaseDwithtreewidth≤kfor some constantk
•
Probabilityof each fact ofD to be actually present in the data (independently from other facts)?
QueryQ: asentencein monadicsecond-order logic (MSO)
(Metro|RER)∗
|(Bus|Tram)∗
i
Result: Probabilitythat the databaseDsatisfies queryQComputational complexityas a function of thedatabaseD (the queryQisfixed)
Probabilistic query evaluation on treelike data
•
DatabaseDwithtreewidth≤kfor some constantk
•
Probabilityof each fact ofD to be actually present in the data (independently from other facts)?
QueryQ: asentencein monadicsecond-order logic (MSO)
(Metro|RER)∗
|(Bus|Tram)∗
i
Result: Probabilitythat the databaseDsatisfies queryQComputational complexityas a function of thedatabaseD (the queryQisfixed)
Roadmap
Tree automaton Tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Treelike data
linear
Roadmap
Tree automaton Tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Treelike data
linear linear
Provenance circuit
Roadmap
Tree automaton Uncertain tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Probabilistic treelike data
linear linear
Provenance circuit
Probability
95%
linear
+probabilities
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels
Valuation:{2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{27→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
The tree automatonAaccepts
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{27→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
The tree automatonArejects
Uncertain trees
1
5
6 7 2
3 4
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation:{2,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
The tree automatonAaccepts
Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1} Example:ν ={x7→0, y7→1}... mapped to1Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1}Example:ν ={x7→0, y7→1}...
mapped to1
Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1}Example:ν ={x7→0, y7→1}...
mapped to1
Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1}Example:ν ={x7→0, y7→1}...
mapped to1
Boolean circuit
∨
¬
x
∧
y
•
Directed acyclic graph ofgates•
Outputgate:•
Variablegates: x•
Internalgates: ∨ ∧ ¬•
Valuation: function from variables to{0,1}Example:ν ={x7→0, y7→1}... mapped to1
Provenance circuit 1
5 6 7 2
4 3
Query:Is there both a pink and a blue node?
Provenance circuit:
∧
∨ 7
2 3
Formally:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenν(C)iffAacceptsν(T)Provenance circuit 1
5 6 7 2
4 3
Query:Is there both a pink and a blue node?
Provenance circuit:
∧
∨ 7
2 3
Formally:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenν(C)iffAacceptsν(T)Provenance circuit 1
5 6 7 2
4 3
Query:Is there both a pink and a blue node?
Provenance circuit:
∧
∨ 7
2 3
Formally:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenν(C)iffAacceptsν(T)Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:> P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build aprovenance circuitofAonT inO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∨ ∨ ∨ ∨
⊥ B P >
∧
∧
¬ ∧
Probabilistic query evaluation
Tree automaton Uncertain tree encoding
(RER|metro)*
|(bus|tram)*
MSO query Probabilistic treelike data
linear linear
Provenance circuit
Probability
95%
linear
+probabilities
Details of the approach
Probabilistic treelike data
Each fact can disappear with some probability
→ How to computeefficientlythe probability of the circuit?
Details of the approach
Uncertain tree encoding Probabilistic
treelike data
Each fact can disappear with some probability
Each node label can disappear with the probability of the coded fact
→ How to computeefficientlythe probability of the circuit?
Details of the approach
Uncertain tree encoding Probabilistic
treelike data
Each fact can disappear with some probability
Each node label can disappear with the probability of the coded fact
Each variable can be true with the probability of the coded fact
Provenance circuit +probabilities
→ How to computeefficientlythe probability of the circuit?
Details of the approach
Uncertain tree encoding Probabilistic
treelike data
Each fact can disappear with some probability
Each node label can disappear with the probability of the coded fact
Each variable can be true with the probability of the coded fact
Probability
95%
Probability that the circuit evaluates to true
Provenance circuit +probabilities
→ How to computeefficientlythe probability of the circuit?
Details of the approach
Uncertain tree encoding Probabilistic
treelike data
Each fact can disappear with some probability
Each node label can disappear with the probability of the coded fact
Each variable can be true with the probability of the coded fact
Probability
95%
Probability that the circuit evaluates to true
Provenance circuit +probabilities
→ How to computeefficientlythe probability of the circuit?
Computing the circuit probability The circuit is ad-DNNF...
... so probability computation iseasy!
•
¬ gates only have variablesas inputs¬ g
g0
P(g) :=1−P(g0)
•
∨ gates always have mutually exclusiveinputs∨ g
g01 g02
P(g) :=P(g01) +P(g02)
•
∧ gates are all on independentinputs∧ g
g01 g02
P(g) :=P(g01)×P(g02)
Theorem [Amarilli et al., 2015]
For anyfixedBoolean MSO queryQandk∈N,
given a databaseDoftreewidth≤kwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ
Computing the circuit probability The circuit is ad-DNNF...
... so probability computation iseasy!
•
¬ gates only have variablesas inputs¬ g
g0
P(g) :=1−P(g0)
•
∨ gates always have mutually exclusiveinputs∨ g
g01 g02
P(g) :=P(g01) +P(g02)
•
∧ gates are all on independentinputs∧ g
g01 g02
P(g) :=P(g01)×P(g02)
Theorem [Amarilli et al., 2015]
For anyfixedBoolean MSO queryQandk∈N,
given a databaseDoftreewidth≤kwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ
Computing the circuit probability The circuit is ad-DNNF...
... so probability computation iseasy!
•
¬ gates only have variablesas inputs¬ g
g0
P(g) :=1−P(g0)
•
∨ gates always have mutually exclusiveinputs∨ g
g01 g02
P(g) :=P(g01) +P(g02)
•
∧ gates are all on independentinputs∧ g
g01 g02
P(g) :=P(g01)×P(g02)
Theorem [Amarilli et al., 2015]
For anyfixedBoolean MSO queryQandk∈N,
given a databaseDoftreewidth≤kwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ
Computing the circuit probability The circuit is ad-DNNF...
... so probability computation iseasy!
•
¬ gates only have variablesas inputs¬ g
g0
P(g) :=1−P(g0)
•
∨ gates always have mutually exclusiveinputs∨ g
g01 g02
P(g) :=P(g01) +P(g02)
•
∧ gates are all on independentinputs∧ g
g01 g02
P(g) :=P(g01)×P(g02)
Theorem [Amarilli et al., 2015]
For anyfixedBoolean MSO queryQandk∈N,
given a databaseDoftreewidth≤kwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ
Computing the circuit probability
The circuit is ad-DNNF... ... so probability computation iseasy!
•
¬ gates only have variablesas inputs¬ g
g0
P(g) :=1−P(g0)
•
∨ gates always have mutually exclusiveinputs∨ g
g01 g02
P(g) :=P(g01) +P(g02)
•
∧ gates are all on independentinputs∧ g
g01 g02
P(g) :=P(g01)×P(g02)
Theorem [Amarilli et al., 2015]
For anyfixedBoolean MSO queryQandk∈N,
given a databaseDoftreewidth≤kwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ
Computing the circuit probability
The circuit is ad-DNNF... ... so probability computation iseasy!
•
¬ gates only have variablesas inputs¬ g
g0
P(g) :=1−P(g0)
•
∨ gates always have mutually exclusiveinputs∨ g
g01 g02
P(g) :=P(g01) +P(g02)
•
∧ gates are all on independentinputs∧ g
g01 g02
P(g) :=P(g01)×P(g02)
Theorem [Amarilli et al., 2015]
For anyfixedBoolean MSO queryQandk∈N,
given a databaseDoftreewidth≤kwithindependent probabilities, we can compute inlinear timethe probability thatDsatisfiesQ