Introduction Trees Treelike instances Future work
Structurally Tractable Uncertain Data
Antoine Amarilli
Supervisor: Pierre Senellart Expected graduation: August 2016
Télécom ParisTech, France
May 31st, 2015
Introduction Trees Treelike instances Future work
Uncertain data management
Is data alwayscompleteandcertain?
Unreliable sources
→ Crowdsourcing
→ Massive collaborations: Wikidata, etc. Error-prone processing
→ Unsupervisedinformation extraction
→ OCR,speech recognition, etc. Outdated or stale data
→ We need uncertain data management
Introduction Trees Treelike instances Future work
Uncertain data management
Is data alwayscompleteandcertain?
Unreliable sources
→ Crowdsourcing
→ Massive collaborations: Wikidata, etc.
Error-prone processing
→ Unsupervisedinformation extraction
→ OCR,speech recognition, etc. Outdated or stale data
→ We need uncertain data management
Introduction Trees Treelike instances Future work
Uncertain data management
Is data alwayscompleteandcertain?
Unreliable sources
→ Crowdsourcing
→ Massive collaborations: Wikidata, etc.
Error-prone processing
→ Unsupervisedinformation extraction
→ OCR,speech recognition, etc.
Outdated or stale data
→ We need uncertain data management
Introduction Trees Treelike instances Future work
Uncertain data management
Is data alwayscompleteandcertain?
Unreliable sources
→ Crowdsourcing
→ Massive collaborations: Wikidata, etc.
Error-prone processing
→ Unsupervisedinformation extraction
→ OCR,speech recognition, etc.
Outdated or stale data
→ We need uncertain data management
Introduction Trees Treelike instances Future work
Uncertain data management
Is data alwayscompleteandcertain?
Unreliable sources
→ Crowdsourcing
→ Massive collaborations: Wikidata, etc.
Error-prone processing
→ Unsupervisedinformation extraction
→ OCR,speech recognition, etc.
Outdated or stale data
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance Date Animal Wed 3rd Kangaroo Wed 3rd Tasmanian devil Thu 4th Kangaroo Thu 4th Tasmanian devil Fri 5th Kangaroo Fri 5th Tasmanian devil
Add probabilitiesto facts
Assume independence between facts
→ Semantics: aprobability distributionon regular instances What about queries? (Boolean CQs)
→ Semantics: compute theprobabilitythat the query holds
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance Date Animal Wed 3rd Kangaroo Wed 3rd Tasmanian devil Thu 4th Kangaroo Thu 4th Tasmanian devil Fri 5th Kangaroo Fri 5th Tasmanian devil Add probabilitiesto facts
Assume independence between facts
→ Semantics: aprobability distributionon regular instances What about queries? (Boolean CQs)
→ Semantics: compute theprobabilitythat the query holds
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance
Date Animal Probability
Wed 3rd Kangaroo 5%
Wed 3rd Tasmanian devil 0%
Thu 4th Kangaroo 6%
Thu 4th Tasmanian devil 2%
Fri 5th Kangaroo 20%
Fri 5th Tasmanian devil 15%
Add probabilitiesto facts
Assume independence between facts
→ Semantics: aprobability distributionon regular instances What about queries? (Boolean CQs)
→ Semantics: compute theprobabilitythat the query holds
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance
Date Animal Probability
Wed 3rd Kangaroo 5%
Wed 3rd Tasmanian devil 0%
Thu 4th Kangaroo 6%
Thu 4th Tasmanian devil 2%
Fri 5th Kangaroo 20%
Fri 5th Tasmanian devil 15%
Add probabilitiesto facts
→ Semantics: aprobability distributionon regular instances What about queries? (Boolean CQs)
→ Semantics: compute theprobabilitythat the query holds
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance
Date Animal Probability
Wed 3rd Kangaroo 5%
Wed 3rd Tasmanian devil 0%
Thu 4th Kangaroo 6%
Thu 4th Tasmanian devil 2%
Fri 5th Kangaroo 20%
Fri 5th Tasmanian devil 15%
Add probabilitiesto facts
Assume independence between facts
→ Semantics: aprobability distributionon regular instances
What about queries? (Boolean CQs)
→ Semantics: compute theprobabilitythat the query holds
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance
Date Animal Probability
Wed 3rd Kangaroo 5%
Wed 3rd Tasmanian devil 0%
Thu 4th Kangaroo 6%
Thu 4th Tasmanian devil 2%
Fri 5th Kangaroo 20%
Fri 5th Tasmanian devil 15%
Add probabilitiesto facts
→ Semantics: compute theprobabilitythat the query holds
Introduction Trees Treelike instances Future work
Example model: TID
Consider a relational instance
Date Animal Probability
Wed 3rd Kangaroo 5%
Wed 3rd Tasmanian devil 0%
Thu 4th Kangaroo 6%
Thu 4th Tasmanian devil 2%
Fri 5th Kangaroo 20%
Fri 5th Tasmanian devil 15%
Add probabilitiesto facts
Assume independence between facts
→ Semantics: aprobability distributionon regular instances What about queries? (Boolean CQs)
→
Introduction Trees Treelike instances Future work
Big problem: Tractability
Evaluate thefixed Boolean CQ:∃xy R(x)S(x,y)T(y) Measure data complexity, i.e., as a function of theinstance
→ #P-hard [Dalvi and Suciu, 2007] (instead of AC0) Existing approaches:
Avoidhard queries [Dalvi and Suciu, 2012] Use samplingto getapproximate answers
Introduction Trees Treelike instances Future work
Big problem: Tractability
Evaluate thefixed Boolean CQ:∃xy R(x)S(x,y)T(y) Measure data complexity, i.e., as a function of theinstance
→ #P-hard [Dalvi and Suciu, 2007] (instead of AC0)
Existing approaches:
Avoidhard queries [Dalvi and Suciu, 2012] Use samplingto getapproximate answers
Introduction Trees Treelike instances Future work
Big problem: Tractability
Evaluate thefixed Boolean CQ:∃xy R(x)S(x,y)T(y) Measure data complexity, i.e., as a function of theinstance
→ #P-hard [Dalvi and Suciu, 2007] (instead of AC0) Existing approaches:
Avoidhard queries [Dalvi and Suciu, 2012]
Usesampling to getapproximate answers
Introduction Trees Treelike instances Future work
The general idea
Input instances are notarbitrary!
→ Impose structural restrictions on instances
→ Prove fixed-parameter tractability results
Introduction Trees Treelike instances Future work
This talk
Parameter: instance treewidth Bound it by a constant
→ MSO queries havelineardata complexity [Courcelle, 1990]
→ Also holds on TID instances(with unit cost arithmetics) (joint work with Pierre Bourhis and Pierre Senellart)
Introduction Trees Treelike instances Future work
This talk
Parameter: instance treewidth Bound it by a constant
→ MSO queries havelineardata complexity [Courcelle, 1990]
→ Also holds on TID instances(with unit cost arithmetics) (joint work with Pierre Bourhis and Pierre Senellart)
Introduction Trees Treelike instances Future work
Table of contents
1 Introduction
2 Trees
3 Treelike instances
4 Future work
Introduction Trees Treelike instances Future work
Uncertain tree example
A possiblePrXML tree, from Wikidata facts:
Q298423
ind
occupation musician
0.4 ind
place of birth Crescent
0.7 ind
surname Manning 0.7 given name
mux Chelsea Bradley
0.4 0.6
→ Probabilities reflectcontributor trustworthiness
Introduction Trees Treelike instances Future work
Formalizing uncertain trees
1
5
7 6
2
4 3
A valuationof a tree decides whether to keepor discard node labels.
Example query:
“Is there both a red and green node?”
Valuation: {1,2,3,4,5,6,7} The query is true
Introduction Trees Treelike instances Future work
Formalizing uncertain trees
1
5
7 6
2
4 3
A valuationof a tree decides whether to keepor discard node labels.
Example query:
“Is there both a red and green node?”
Valuation: {1,2,5,6} The query is false
Introduction Trees Treelike instances Future work
Formalizing uncertain trees
1
5
7 6
2
4 3
A valuationof a tree decides whether to keepor discard node labels.
Example query:
“Is there both a red and green node?”
Valuation: {2,7} The query is true
Introduction Trees Treelike instances Future work
Provenance formulae and circuits
1
5
7 6
2
4 3
Whichvaluations satisfy the query?
→ Provenance formula of a query q on an uncertain treeT:
Boolean formulaϕ onvariablesx1. . .x7
→ ν(T)satisfiesqiffν(ϕ)istrue Provenance circuit ofq onT [Deutch et al., 2014]
Boolean circuitC withinput gatesg1. . .g7
→ ν(T)satisfiesqiffν(C)is true
Introduction Trees Treelike instances Future work
Provenance formulae and circuits
1
5
7 6
2
4 3
Whichvaluations satisfy the query?
→ Provenance formula of a query q on an uncertain treeT:
Boolean formulaϕ onvariablesx1. . .x7
→ ν(T)satisfiesqiffν(ϕ)istrue
Provenance circuit ofq onT [Deutch et al., 2014]
Boolean circuitC withinput gatesg1. . .g7
→ ν(T)satisfiesqiffν(C)is true
Introduction Trees Treelike instances Future work
Provenance formulae and circuits
1
5
7 6
2
4 3
Whichvaluations satisfy the query?
→ Provenance formula of a query q on an uncertain treeT:
Boolean formulaϕ onvariablesx1. . .x7
→ ν(T)satisfiesqiffν(ϕ)istrue Provenance circuit ofq onT [Deutch et al., 2014]
Boolean circuitC withinput gatesg1. . .g7
→ ν(T)satisfiesqiffν(C)is true
Introduction Trees Treelike instances Future work
Example
1
5
7 6
2
4 3
Is there both a red and agreennode?
Provenance formula: (x2∨x3)∧x7 Provenance circuit:
∧
∨ g7
g2 g3
Introduction Trees Treelike instances Future work
Example
1
5
7 6
2
4 3
Is there both a red and agreennode?
Provenance formula: (x2∨x3)∧x7
Provenance circuit:
∧
∨ g7
g2 g3
Introduction Trees Treelike instances Future work
Example
1
5
7 6
2
4 3
Is there both a red and agreennode?
Provenance formula: (x2∨x3)∧x7 Provenance circuit:
∧
∨ g7
g2 g3
Introduction Trees Treelike instances Future work
Our main result on trees
Theorem
For any fixedMSO query q (first order + quantify on sets) we can compute aprovenance circuitC for any input tree T inlinear time in the input T.
→ Key ideas:
Compile q to a tree automaton[Thatcher and Wright, 1968] Write thepossible transitions of the automaton onT in C Corollary
If tree nodes have a probabilityof being independently kept, we can compute thequery probabilityin linear time.
→ Relates tomessage passing[Lauritzen and Spiegelhalter, 1988]
→ Already known [Cohen et al., 2009]
Introduction Trees Treelike instances Future work
Our main result on trees
Theorem
For any fixedMSO query q (first order + quantify on sets) we can compute aprovenance circuitC for any input tree T inlinear time in the input T.
→ Key ideas:
Compile q to a tree automaton[Thatcher and Wright, 1968]
Write thepossible transitions of the automaton onT in C
Corollary
If tree nodes have a probabilityof being independently kept, we can compute thequery probabilityin linear time.
→ Relates tomessage passing[Lauritzen and Spiegelhalter, 1988]
→ Already known [Cohen et al., 2009]
Introduction Trees Treelike instances Future work
Our main result on trees
Theorem
For any fixedMSO query q (first order + quantify on sets) we can compute aprovenance circuitC for any input tree T inlinear time in the input T.
→ Key ideas:
Compile q to a tree automaton[Thatcher and Wright, 1968]
Write thepossible transitions of the automaton onT in C Corollary
If tree nodes have aprobability of being independently kept, we can compute thequery probabilityin linear time.
→ Relates tomessage passing[Lauritzen and Spiegelhalter, 1988]
→ Already known [Cohen et al., 2009]
Introduction Trees Treelike instances Future work
Our main result on trees
Theorem
For any fixedMSO query q (first order + quantify on sets) we can compute aprovenance circuitC for any input tree T inlinear time in the input T.
→ Key ideas:
Compile q to a tree automaton[Thatcher and Wright, 1968]
Write thepossible transitions of the automaton onT in C Corollary
If tree nodes have aprobability of being independently kept, we
Introduction Trees Treelike instances Future work
Table of contents
1 Introduction
2 Trees
3 Treelike instances
4 Future work
Introduction Trees Treelike instances Future work
Treewidth intuition
Generalize fromtrees totreelike instances:
Treewidth: measure on instances Treeshave treewidth1 Cycleshave treewidth2
k-cliquesandk-gridshave treewidthk−1 Treelike: the treewidthis bounded by a constant
→ Treelike instances can be encodedto trees
Introduction Trees Treelike instances Future work
Treewidth intuition
Generalize fromtrees totreelike instances:
Treewidth: measure on instances Treeshave treewidth1 Cycleshave treewidth2
k-cliquesandk-gridshave treewidthk−1 Treelike: the treewidthis bounded by a constant
→ Treelike instances can be encodedto trees
Introduction Trees Treelike instances Future work
Treewidth intuition
Generalize fromtrees totreelike instances:
Treewidth: measure on instances Treeshave treewidth1 Cycleshave treewidth2
k-cliquesandk-gridshave treewidthk−1 Treelike: the treewidthis bounded by a constant
→ Treelike instances can be encodedto trees
Introduction Trees Treelike instances Future work
Treewidth formal definition
Instance:
N a b b c c d d e e f
S a c b e
Gaifman graph: a
b
c d
e f
Tree decomp.: a b c b c e c d e e f
Tree encoding: N(a1,a2) N(a2,a3) S(a1,a3) S(a2,a4) N(a3,a1) N(a1,a4)
N(a4,a1)
→ Treelike: constantbound on the maximal bag size
Introduction Trees Treelike instances Future work
Treewidth formal definition
Instance:
N a b b c c d d e e f
S a c
Gaifman graph:
a
b
c d
e f
Tree decomp.: a b c b c e c d e e f
Tree encoding: N(a1,a2) N(a2,a3) S(a1,a3) S(a2,a4) N(a3,a1) N(a1,a4)
N(a4,a1)
→ Treelike: constantbound on the maximal bag size
Introduction Trees Treelike instances Future work
Treewidth formal definition
Instance:
N a b b c c d d e e f
S a c b e
Gaifman graph:
a
b
c d
e f
Tree decomp.:
a b c b c e c d e e f
Tree encoding: N(a1,a2) N(a2,a3) S(a1,a3) S(a2,a4) N(a3,a1) N(a1,a4)
N(a4,a1)
→ Treelike: constantbound on the maximal bag size
Introduction Trees Treelike instances Future work
Treewidth formal definition
Instance:
N a b b c c d d e e f
S a c
Gaifman graph:
a
b
c d
e f
Tree decomp.:
a b c b c e c d e e f
Tree encoding:
N(a1,a2) N(a2,a3) S(a1,a3) S(a2,a4) N(a3,a1) N(a4,a1)
→ Treelike: constantbound on the maximal bag size
Introduction Trees Treelike instances Future work
Treewidth formal definition
Instance:
N a b b c c d d e e f
S a c b e
Gaifman graph:
a
b
c d
e f
Tree decomp.:
a b c b c e c d e e f
Tree encoding:
N(a1,a2) N(a2,a3) S(a1,a3) S(a2,a4) N(a3,a1) N(a1,a4)
N(a4,a1)
→ Treelike: constantbound on themaximal bag size
Introduction Trees Treelike instances Future work
Our main result on treelike instances
Theorem
For any fixedMSO query q and bound k∈N, for any inputinstance I of treewidth≤k,
we can compute inlinear timea provenance circuit of q on I.
→ Key ideas:
Compute tree decomposition and tree encoding inlinear time Compileqto an automaton on encodings [Flum et al., 2002] Usethe previous construction
→ Possible subinstances are possible valuations of the encoding Corollary
MSO queries havelineardata complexity on treelike TID instances.
Introduction Trees Treelike instances Future work
Our main result on treelike instances
Theorem
For any fixedMSO query q and bound k∈N, for any inputinstance I of treewidth≤k,
we can compute inlinear timea provenance circuit of q on I.
→ Key ideas:
Compute tree decomposition and tree encoding inlinear time Compileqto an automaton on encodings [Flum et al., 2002]
Usethe previous construction
→ Possible subinstances are possible valuations of the encoding
Corollary
MSO queries havelineardata complexity on treelike TID instances.
Introduction Trees Treelike instances Future work
Our main result on treelike instances
Theorem
For any fixedMSO query q and bound k∈N, for any inputinstance I of treewidth≤k,
we can compute inlinear timea provenance circuit of q on I.
→ Key ideas:
Compute tree decomposition and tree encoding inlinear time Compileqto an automaton on encodings [Flum et al., 2002]
Usethe previous construction
→ Possible subinstances are possible valuations of the encoding
Introduction Trees Treelike instances Future work
Further results
Support other models with dependenciesbetween facts:
Block-independent disjoint(BID): mutually exclusive facts pc-tables: events and Boolean annotations
Support other tasks:
Counting query resultsencodes to probabilistic evaluation General connection tosemiring provenance[Green et al., 2007]
Introduction Trees Treelike instances Future work
Further results
Support other models with dependenciesbetween facts:
Block-independent disjoint(BID): mutually exclusive facts pc-tables: events and Boolean annotations
Support other tasks:
Counting query resultsencodes to probabilistic evaluation General connection tosemiring provenance[Green et al., 2007]
Introduction Trees Treelike instances Future work
Table of contents
1 Introduction
2 Trees
3 Treelike instances
4 Future work
Introduction Trees Treelike instances Future work
Extending the provenance connection
Negation:
Semiring provenance usually defined forpositive queries Yet ourprovenance circuitswork fine with negation
→ Relate this to provenance forqueries with negation?
Multiplicities:
Our works connects to theuniversal semiringN[X]... ... but only forUCQs, not arbitraryMSO
Missing: notion ofmultiplicityfor MSO (multisets?) Structural restrictions:
Are real-world instancestree-like? Are there other possiblerestrictions? Experiments?
Introduction Trees Treelike instances Future work
Extending the provenance connection
Negation:
Semiring provenance usually defined forpositive queries Yet ourprovenance circuitswork fine with negation
→ Relate this to provenance forqueries with negation?
Multiplicities:
Our works connects to theuniversal semiringN[X]...
... but only forUCQs, not arbitraryMSO
Missing: notion ofmultiplicityfor MSO (multisets?)
Structural restrictions:
Are real-world instancestree-like? Are there other possiblerestrictions? Experiments?
Introduction Trees Treelike instances Future work
Extending the provenance connection
Negation:
Semiring provenance usually defined forpositive queries Yet ourprovenance circuitswork fine with negation
→ Relate this to provenance forqueries with negation?
Multiplicities:
Our works connects to theuniversal semiringN[X]...
... but only forUCQs, not arbitraryMSO
Missing: notion ofmultiplicityfor MSO (multisets?) Structural restrictions:
Are real-world instancestree-like?
Introduction Trees Treelike instances Future work
Connect to other frameworks
Compiling to automata has high combined complexity
→ Investigate Monadic Datalogapproaches [Gottlob et al., 2010]
Uncertainty on facts notvalues
→ Connect to work on nulls[Libkin, 2014]
What about reasoning on uncertain data and its implications?
→ Connect totractable languages (e.g., guarded Datalog) What about incorporating new evidence?
→ Connect to work on conditioning [Tang et al., 2012]
Introduction Trees Treelike instances Future work
Connect to other frameworks
Compiling to automata has high combined complexity
→ Investigate Monadic Datalogapproaches [Gottlob et al., 2010]
Uncertainty on facts notvalues
→ Connect to work on nulls[Libkin, 2014]
What about reasoning on uncertain data and its implications?
→ Connect totractable languages (e.g., guarded Datalog) What about incorporating new evidence?
→ Connect to work on conditioning [Tang et al., 2012]
Introduction Trees Treelike instances Future work
Connect to other frameworks
Compiling to automata has high combined complexity
→ Investigate Monadic Datalogapproaches [Gottlob et al., 2010]
Uncertainty on facts notvalues
→ Connect to work on nulls[Libkin, 2014]
What about reasoning on uncertain data and its implications?
→ Connect totractable languages (e.g., guarded Datalog)
What about incorporating new evidence?
→ Connect to work on conditioning [Tang et al., 2012]
Introduction Trees Treelike instances Future work
Connect to other frameworks
Compiling to automata has high combined complexity
→ Investigate Monadic Datalogapproaches [Gottlob et al., 2010]
Uncertainty on facts notvalues
→ Connect to work on nulls[Libkin, 2014]
What about reasoning on uncertain data and its implications?
→ Connect totractable languages (e.g., guarded Datalog)
Introduction Trees Treelike instances Future work
Other projects and directions
Open-world query answering (with Michael Benedikt)
Certaintyof a Boolean CQ when completing under constraints Which constraint languages aredecidable?
What is the impact of assumingfiniteness?
Uncertain ordered data
Bag semanticsfor the relational algebra
(with M. Lamine Ba, Daniel Deutch, Pierre Senellart) Interpolation schemesfor partially ordered numerical values (with Yael Amsterdamer, Tova Milo, Pierre Senellart) Problem ofinstance possibility
Onuncertain orders(labeled posets) Onprobabilistic XML
Thanks for your attention!
Introduction Trees Treelike instances Future work
Other projects and directions
Open-world query answering (with Michael Benedikt)
Certaintyof a Boolean CQ when completing under constraints Which constraint languages aredecidable?
What is the impact of assumingfiniteness?
Uncertain ordered data
Bag semanticsfor the relational algebra
(with M. Lamine Ba, Daniel Deutch, Pierre Senellart) Interpolation schemesfor partially ordered numerical values (with Yael Amsterdamer, Tova Milo, Pierre Senellart)
Problem ofinstance possibility
Onuncertain orders(labeled posets) Onprobabilistic XML
Thanks for your attention!
Introduction Trees Treelike instances Future work
Other projects and directions
Open-world query answering (with Michael Benedikt)
Certaintyof a Boolean CQ when completing under constraints Which constraint languages aredecidable?
What is the impact of assumingfiniteness?
Uncertain ordered data
Bag semanticsfor the relational algebra
(with M. Lamine Ba, Daniel Deutch, Pierre Senellart) Interpolation schemesfor partially ordered numerical values (with Yael Amsterdamer, Tova Milo, Pierre Senellart) Problem ofinstance possibility
Onuncertain orders(labeled posets) Onprobabilistic XML
Thanks for your attention!
Introduction Trees Treelike instances Future work
Other projects and directions
Open-world query answering (with Michael Benedikt)
Certaintyof a Boolean CQ when completing under constraints Which constraint languages aredecidable?
What is the impact of assumingfiniteness?
Uncertain ordered data
Bag semanticsfor the relational algebra
(with M. Lamine Ba, Daniel Deutch, Pierre Senellart) Interpolation schemesfor partially ordered numerical values (with Yael Amsterdamer, Tova Milo, Pierre Senellart) Problem ofinstance possibility
References I
Cohen, S., Kimelfeld, B., and Sagiv, Y. (2009).
Running tree automata on probabilistic XML.
InPODS.
Courcelle, B. (1990).
The monadic second-order logic of graphs. I. Recognizable sets of finite graphs.
Inf. Comput., 85(1).
Dalvi, N. and Suciu, D. (2007).
Efficient query evaluation on probabilistic databases.
VLDBJ, 16(4):523–544.
References II
Dalvi, N. and Suciu, D. (2012).
The dichotomy of probabilistic inference for unions of conjunctive queries.
JACM, 59(6):30.
Deutch, D., Milo, T., Roy, S., and Tannen, V. (2014).
Circuits for datalog provenance.
InICDT.
Flum, J., Frick, M., and Grohe, M. (2002).
Query evaluation via tree-decompositions.
JACM, 49(6):716–752.
References III
Green, T. J., Karvounarakis, G., and Tannen, V. (2007).
Provenance semirings.
InPODS.
Lauritzen, S. L. and Spiegelhalter, D. J. (1988).
Local computations with probabilities on graphical structures and their application to expert systems.
J. Royal Statistical Society. Series B.
Libkin, L. (2014).
Incomplete data: what went wrong, and how to fix it.
InProc. SIGMOD.
Tang, R., Cheng, R., Wu, H., and Bressan, S. (2012).
A framework for conditioning uncertain relational data.
InDatabase and Expert Systems Applications. Springer.
References IV
Thatcher, J. W. and Wright, J. B. (1968).
Generalized finite automata theory with an application to a decision problem of second-order logic.
Mathematical systems theory, 2(1):57–81.
Semiring provenance [Green et al., 2007]
Semiring (K,⊕,⊗,0,1)
(K,⊕)commutative monoid with identity0 (K,⊗)commutative monoid with identity1
⊗distributes over⊕ 0absorptive for⊗
Idea: Maintainannotations on tuples while evaluating: Union: annotation is thesum of union tuples
Select: select as usual
Project: annotation is thesum of projected tuples Product: annotation is theproduct
Semiring provenance [Green et al., 2007]
Semiring (K,⊕,⊗,0,1)
(K,⊕)commutative monoid with identity0 (K,⊗)commutative monoid with identity1
⊗distributes over⊕ 0absorptive for⊗
Idea: Maintainannotations on tuples while evaluating:
Union: annotation is thesum of union tuples Select: select as usual
Project: annotation is thesum of projected tuples
Tree automata
Tree alphabet:
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤} Final states: {⊤} Initial function:
⊥ R G
Transitions(examples): R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤} Final states: {⊤} Initial function:
⊥ R G
Transitions(examples): R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤} Initial function:
⊥ R G
Transitions(examples): R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤}
Initial function:
⊥ R G
Transitions(examples): R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤}
Initial function:
⊥ R G
Transitions(examples): R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤}
Initial function:
⊥ R G
Transitions(examples): R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
⊥ G
⊥ R
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤}
Initial function:
⊥ R G
Transitions(examples):
R
⊥ R
⊤ G R
⊥
⊥
⊥
Tree automata
Tree alphabet:
G R
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤}
Initial function:
⊥ R G
Transitions(examples):
Tree automata
Tree alphabet:
⊤
G
⊥ G R
⊥ R
bNTA: bottom-up nondeterministic tree automaton
“Is there both a red and green node?”
States: {⊥,G,R,⊤}
Final states: {⊤}
Initial function:
⊥ R G
Transitions(examples):
R
⊥ R
⊤ G R
⊥
⊥
⊥
Constructing the provenance circuit
→ Construct a Boolean provenance circuitbottom-up
... q
1q
2...
in
Constructing the provenance circuit
→ Construct a Boolean provenance circuitbottom-up
q
1q
2∧ ...
in
whenever q
q
1q
2in in
q
∨ ∨ ∨
... ... ... ...
...
Constructing the provenance circuit
→ Construct a Boolean provenance circuitbottom-up
q
1q
2∧ ...
in
whenever q
q q
q
¬
∨ ∨ ∨
...
Example: block-independent disjoint (BID) instances
name city iso p pods melbourne au 0.8
pods sydney au 0.2
icalp tokyo jp 0.1
icalp kyoto jp 0.9
Evaluating a fixed CQ is #P-hard in general
→ For a treelikeinstance, linear time!
Example: block-independent disjoint (BID) instances
name city iso p pods melbourne au 0.8
pods sydney au 0.2
icalp tokyo jp 0.1
icalp kyoto jp 0.9
Evaluating a fixed CQ is #P-hard in general
→ For a treelikeinstance, linear time!
Example: block-independent disjoint (BID) instances
name city iso p pods melbourne au 0.8
pods sydney au 0.2
icalp tokyo jp 0.1
icalp kyoto jp 0.9
Evaluating a fixed CQ is #P-hard in general
→ For a treelikeinstance, linear time!
Supporting coefficients
In the world oftrees
The samevaluationcan be acceptedmultiple times
→ Number ofaccepting runsof the bNTA In the world oftreelike instances
The samematchcan be the image ofmultiple homomorphisms
→ Add assignment facts to represent possible assignments
→ Encode to a bNTA thatguesses them
Supporting coefficients
In the world oftrees
The samevaluationcan be acceptedmultiple times
→ Number ofaccepting runsof the bNTA In the world oftreelike instances
The samematchcan be the image ofmultiple homomorphisms
→ Add assignment facts to represent possible assignments
→ Encode to a bNTA thatguesses them
Supporting exponents
In the world oftrees
The samefactcan be usedmultiple times Annotate nodes with amultiplicity
The bNTA ismonotonefor thatmultiplicity
Use eachinput gateas many times as we read its fact In the world oftreelike instances
The samefactcan be the image ofmultiple atoms Maximal multiplicityis query-dependent but instance-independent
→ Encodes CQs to bNTAs that readmultiplicities Consider all possible CQself-homomorphisms Count the multiplicities ofidentical atoms Rewrite relations toadd multiplicities Usual compilation on themodified signature
Supporting exponents
In the world oftrees
The samefactcan be usedmultiple times Annotate nodes with amultiplicity
The bNTA ismonotonefor thatmultiplicity
Use eachinput gateas many times as we read its fact In the world oftreelike instances
The samefactcan be the image ofmultiple atoms Maximal multiplicityis query-dependent but instance-independent
→ Encodes CQs to bNTAs that readmultiplicities Consider all possible CQself-homomorphisms Count the multiplicities ofidentical atoms Rewrite relations toadd multiplicities Usual compilation on themodified signature