A Circuit-Based Approach to Efficient Enumeration:
Enumerating MSO Query Results on Trees and Words
Antoine Amarilli1, Pierre Bourhis2, Louis Jachiet2, Stefan Mengel3, Matthias Niewerth4
May 20, 2019
1Télécom ParisTech
2CNRS CRIStAL
Roadmap of this talk
•
Stefan has presented enumeration ford-DNNF set circuits•
This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate•
Using the previous talkwe can reprove the result: Theorem [Bagan, 2006, Kazana and Segoufin, 2013]We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
•
Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]We can enumerate the results of anautomatonon a tree with:
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton
•
Next talk:how to supportupdates(and provenew results)Roadmap of this talk
•
Stefan has presented enumeration ford-DNNF set circuits•
This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate•
Using the previous talkwe can reprove the result: Theorem [Bagan, 2006, Kazana and Segoufin, 2013]We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
•
Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]We can enumerate the results of anautomatonon a tree with:
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton
•
Next talk:how to supportupdates(and provenew results)Roadmap of this talk
•
Stefan has presented enumeration ford-DNNF set circuits•
This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate•
Using the previous talkwe can reprove the result:Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
•
Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]We can enumerate the results of anautomatonon a tree with:
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton
•
Next talk:how to supportupdates(and provenew results)Roadmap of this talk
•
Stefan has presented enumeration ford-DNNF set circuits•
This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate•
Using the previous talkwe can reprove the result:Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
•
Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]We can enumerate the results of anautomatonon a tree with:
• Preprocessinglinearin the tree andpolynomialin the automaton
•
Next talk:how to supportupdates(and provenew results)Roadmap of this talk
•
Stefan has presented enumeration ford-DNNF set circuits•
This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate•
Using the previous talkwe can reprove the result:Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
•
Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]We can enumerate the results of anautomatonon a tree with:
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton
Boolean MSO on trees
Boolean query evaluation on trees
Data: a treeTwhere nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeTsatisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Boolean query evaluation on trees
Data: a treeTwhere nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeTsatisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Boolean query evaluation on trees
Data: a treeTwhere nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeTsatisfies the queryQComputational complexityas a function ofT (the queryQisfixed)
Boolean query evaluation on trees
Data: a treeTwhere nodes have a color from an alphabet
?
QueryQ: asentencein monadicsecond-order logic (MSO)
•P (x)means “xis blue”
•x→ymeans “xis the parent ofy”
“Is there both a pink and a blue node?”
∃x y P (x)∧P (y)
i
Result: TRUE/FALSE indicating if the treeTsatisfies the queryQComputational complexityas a function ofT
Monadic second-order logic (MSO)
•
P (x)means “xis blue”; alsoP (x),P (x)•
x→ymeans “xis the parent ofy”•
Propositional logic:formulas withAND∧,OR∨,NOT¬• P (x)∧P (y)means“Nodexis pink and nodeyis blue”
•
First-order logic:addsexistential quantifier∃and universal quantifier∀• ∃x y P (x)∧P (y)means“There is both a pink and a blue node”
•
Monadic second-order logic (MSO):addsquantifiers over sets• ∃S∀x S(x)means“there is a setScontaining every elementx”
• Can expresstransitive closurex→∗y, i.e.,“xis an ancestor ofy”
• ∀x P (x)⇒ ∃y P (y)∧x→∗y
means“There is a blue node below every pink node”
Monadic second-order logic (MSO)
•
P (x)means “xis blue”; alsoP (x),P (x)•
x→ymeans “xis the parent ofy”•
Propositional logic:formulas withAND∧,OR∨,NOT¬• P (x)∧P (y)means“Nodexis pink and nodeyis blue”
•
First-order logic:addsexistential quantifier∃and universal quantifier∀• ∃x y P (x)∧P (y)means“There is both a pink and a blue node”
•
Monadic second-order logic (MSO):addsquantifiers over sets• ∃S∀x S(x)means“there is a setScontaining every elementx”
• Can expresstransitive closurex→∗y, i.e.,“xis an ancestor ofy”
• ∀x P (x)⇒ ∃y P (y)∧x→∗y
means“There is a blue node below every pink node”
Monadic second-order logic (MSO)
•
P (x)means “xis blue”; alsoP (x),P (x)•
x→ymeans “xis the parent ofy”•
Propositional logic:formulas withAND∧,OR∨,NOT¬• P (x)∧P (y)means“Nodexis pink and nodeyis blue”
•
First-order logic:addsexistential quantifier∃and universal quantifier∀• ∃x y P (x)∧P (y)means“There is both a pink and a blue node”
•
Monadic second-order logic (MSO):addsquantifiers over sets• ∃S∀x S(x)means“there is a setScontaining every elementx”
• Can expresstransitive closurex→∗y, i.e.,“xis an ancestor ofy”
• ∀x P (x)⇒ ∃y P (y)∧x→∗y
means“There is a blue node below every pink node”
Monadic second-order logic (MSO)
•
P (x)means “xis blue”; alsoP (x),P (x)•
x→ymeans “xis the parent ofy”•
Propositional logic:formulas withAND∧,OR∨,NOT¬• P (x)∧P (y)means“Nodexis pink and nodeyis blue”
•
First-order logic:addsexistential quantifier∃and universal quantifier∀• ∃x y P (x)∧P (y)means“There is both a pink and a blue node”
•
Monadic second-order logic (MSO):addsquantifiers over sets• ∃S∀x S(x)means“there is a setScontaining every elementx”
• Can expresstransitive closurex→∗y, i.e.,“xis an ancestor ofy”
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
B
⊥ P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples): PP ⊥
> B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
B
⊥ P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
B
B
⊥ P
P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥
Tree automata
Tree alphabet:
>
B
B
⊥ P
P ⊥
•
Bottom-up deterministictree automaton•
“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}•
Final states: {>}•
Initial function: ⊥ P B•
Transitions(examples):P P ⊥
>
B P
⊥
⊥
⊥
Boolean MSO query evaluation via automata
Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees
→ Given a Boolean MSO query, we can compute a tree automaton that accepts precisely the trees on which the query holds
→ Complexity (in the query) is generally nonelementary Corollary
Evaluating a Boolean MSO query on a tree is inlinear timein the tree
Boolean MSO query evaluation via automata
Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees
→ Given a Boolean MSO query, we can compute a tree automaton that accepts precisely the trees on which the query holds
→ Complexity (in the query) is generally nonelementary
Corollary
Evaluating a Boolean MSO query on a tree is inlinear timein the tree
Boolean MSO query evaluation via automata
Theorem [Thatcher and Wright, 1968]
MSOandtree automatahave the sameexpressive poweron trees
→ Given a Boolean MSO query, we can compute a tree automaton that accepts precisely the trees on which the query holds
→ Complexity (in the query) is generally nonelementary Corollary
Evaluating a Boolean MSO query on a tree is inlinear timein the tree
Set Circuits for
Non-Boolean MSO Queries
Overall idea
•
Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn•
Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds→ Addspecial nodes: for each nodenand variablexi, add a nodeniwhich is coloredrediffxi is the noden
n n0 n00
⇒
n
•
•
n0 n00 n1 n2
•
Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi•
This can be done inlinear timein the input tree•
Remark: same construction for freesecond-order variablesOverall idea
•
Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn•
Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds→ Addspecial nodes: for each nodenand variablexi, add a nodeniwhich is coloredrediffxi is the noden
n n0 n00
⇒
n
•
•
n0 n00 n1 n2
•
Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi•
This can be done inlinear timein the input tree•
Remark: same construction for freesecond-order variablesOverall idea
•
Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn•
Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds→ Addspecial nodes: for each noden and variablexi, add a nodeniwhich is coloredrediffxi is the noden
n n0 n00
⇒
n
•
•
n0 n00 n1 n2
•
Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi•
This can be done inlinear timein the input tree•
Remark: same construction for freesecond-order variablesOverall idea
•
Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn•
Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds→ Addspecial nodes: for each noden and variablexi, add a nodeniwhich is coloredrediffxi is the noden
n n0 n00
⇒
n
•
•
n0 n00 n1 n2
•
Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi•
This can be done inlinear timein the input tree•
Remark: same construction for freesecond-order variablesOverall idea
•
Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn•
Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds→ Addspecial nodes: for each noden and variablexi, add a nodeniwhich is coloredrediffxi is the noden
n n0 n00
⇒
n
•
•
n0 n00 n1 n2
•
Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi•
This can be done inlinear timein the input treeUncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels
Valuation: {2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {27→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,3,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
The tree automatonAaccepts
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {27→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
The tree automatonArejects
Uncertain trees
Now: Boolean query on a tree where the color of nodes isuncertain 1
5
6 7 2
4 3
Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,77→1, ∗ 7→0}
A:“Is there both a pink and a blue node?”
The tree automatonAaccepts
Set circuit 1
5 6 7 2
4 3
Set circuit:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.Query:Is there both a pink and a blue node?
×
∪ 7
2 3
×
×
× ×
∪ ∪ ∪ ∪
> 1 > 4 > 5 > 6
Set circuit 1
5 6 7 2
4 3
Set circuit:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.Query:Is there both a pink and a blue node?
×
∪ 7
2 3
×
×
× ×
∪ ∪ ∪ ∪
> 1 > 4 > 5 > 6
Set circuit 1
5 6 7 2
4 3
Set circuit:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.Query:Is there both a pink and a blue node?
×
∪ 7
2 3
×
×
× ×
∪ ∪ ∪ ∪
> 1 > 4 > 5 > 6
Set circuit 1
5 6 7 2
4 3
Set circuit:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.Query:Is there both a pink and a blue node?
×
∪ 7
2 3
×
×
× ×
∪ ∪ ∪ ∪
> 1 > 4 > 5 > 6
Set circuit 1
5 6 7 2
4 3
Set circuit:
•
Tree automatonA, uncertain treeT, circuitC•
Variable gatesofC: nodes ofT•
Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.Query:Is there both a pink and a blue node?
×
∪ 7
2 3
×
×
× ×
∪ ∪ ∪ ∪
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:> P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
∪ ∪ ∪ ∪
⊥ B P >
×
×
Building provenance circuits on trees
Theorem
For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)
•
Alphabet:•
Automaton:“Is there both a pink and a blue node?”•
States:{⊥,B,P,>}
•
Final: {>}•
Transitions:>
P ⊥
P P ⊥
n
n
∪ ∪ ∪ ∪
⊥ B P >
×
×
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue
Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results: X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y)
Data: 1
2 3
Results: X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results: X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results:
X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results:
X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results:
X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results:
X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results:
X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Semi-open question:what about memory usage?
Circuits as factorized representations of query results
→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:
Q(X1,X2) :P (x)∧P (y) Data:
1
2 3
Results:
X1 X2
1 2
1 3
Provenance circuit:
× X1(1) ∪
X2(2) X2(3)
{X2(2),X2(3)}
{(X1(1),X2(2)),(X1(1),X2(3))}
Theorem [Bagan, 2006, Kazana and Segoufin, 2013]
We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.
Application to
Pattern Matching in Texts
Problem statement: Pattern matching in texts Data:atextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.
French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
?
Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣i
Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . .Goal:
•
bevery efficientinT (constant-delay)•
bereasonably efficientinP(polynomial-time)Problem statement: Pattern matching in texts Data:atextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.
French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
?
Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣i
Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . .Goal:
•
bevery efficientinT (constant-delay)•
bereasonably efficientinP(polynomial-time)Problem statement: Pattern matching in texts Data:atextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.
French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
?
Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣i
Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . .Goal:
•
bevery efficientinT (constant-delay)•
bereasonably efficientinP(polynomial-time)Problem statement: Pattern matching in texts Data:atextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.
French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
?
Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣i
Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . . Goal:•
bevery efficientinT (constant-delay)Reducing to MSO
•
Atextis just atreewith a simpler shape•
Aregular expression patterncan be expressed in MSO→ More generally: regular expressions withvariables
→ Example:P:=•∗ α a∗ β b∗ γ •∗
•
Translate to aword automaton(withcapture variables)1 2 3 4
• α
a β
b γ
•
→ The MSO result implies: Theorem [Florenzano et al., 2018]
We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay
→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable
Reducing to MSO
•
Atextis just atreewith a simpler shape•
Aregular expression patterncan be expressed in MSO→ More generally: regular expressions withvariables
→ Example:P:=•∗ α a∗ β b∗ γ •∗
•
Translate to aword automaton(withcapture variables)1 2 3 4
• α
a β
b γ
•
→ The MSO result implies: Theorem [Florenzano et al., 2018]
We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay
→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable
Reducing to MSO
•
Atextis just atreewith a simpler shape•
Aregular expression patterncan be expressed in MSO→ More generally: regular expressions withvariables
→ Example:P:=•∗ α a∗ β b∗ γ •∗
•
Translate to aword automaton(withcapture variables)1 2 3 4
• α
a β
b γ
•
→ The MSO result implies: Theorem [Florenzano et al., 2018]
We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay
→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable
Reducing to MSO
•
Atextis just atreewith a simpler shape•
Aregular expression patterncan be expressed in MSO→ More generally: regular expressions withvariables
→ Example:P:=•∗ α a∗ β b∗ γ •∗
•
Translate to aword automaton(withcapture variables)1 2 3 4
• α
a β
b γ
•
→ The MSO result implies: Theorem [Florenzano et al., 2018]
We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay
→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable
Reducing to MSO
•
Atextis just atreewith a simpler shape•
Aregular expression patterncan be expressed in MSO→ More generally: regular expressions withvariables
→ Example:P:=•∗ α a∗ β b∗ γ •∗
•
Translate to aword automaton(withcapture variables)1 2 3 4
• α
a β
b γ
•
→ The MSO result implies:
Theorem [Florenzano et al., 2018]
We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay
→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable
Reducing to MSO
•
Atextis just atreewith a simpler shape•
Aregular expression patterncan be expressed in MSO→ More generally: regular expressions withvariables
→ Example:P:=•∗ α a∗ β b∗ γ •∗
•
Translate to aword automaton(withcapture variables)1 2 3 4
• α
a β
b γ
•
→ The MSO result implies:
Theorem [Florenzano et al., 2018]
We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay
Efficiency in the query
We have shown linear preprocessing and constant delay in thedata; but what about thequery?
•
For general MSO queries:nonelementarycomplexity•
For regular expressions:exponential(determinization)•
However: our methods adapt tonondeterministic automata• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth
Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b] We can enumerate all matches of a nondeterministic tree automaton on a tree with
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton Corollary: enumeration for regular expression patterns on text
Efficiency in the query
We have shown linear preprocessing and constant delay in thedata; but what about thequery?
•
For general MSO queries:nonelementarycomplexity•
For regular expressions:exponential(determinization)•
However: our methods adapt tonondeterministic automata• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth
Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b] We can enumerate all matches of a nondeterministic tree automaton on a tree with
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton Corollary: enumeration for regular expression patterns on text
Efficiency in the query
We have shown linear preprocessing and constant delay in thedata; but what about thequery?
•
For general MSO queries:nonelementarycomplexity•
For regular expressions:exponential(determinization)•
However: our methods adapt tonondeterministic automata• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth
Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b] We can enumerate all matches of a nondeterministic tree automaton on a tree with
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton Corollary: enumeration for regular expression patterns on text
Efficiency in the query
We have shown linear preprocessing and constant delay in thedata; but what about thequery?
•
For general MSO queries:nonelementarycomplexity•
For regular expressions:exponential(determinization)•
However: our methods adapt tonondeterministic automata• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth
Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]
We can enumerate all matches of a nondeterministic tree automaton on a tree with
• Preprocessinglinearin the tree andpolynomialin the automaton
• Delayconstantin the tree andpolynomialin the automaton
Implementation (ongoing internship by Rémi Dupré)
•
Prototype to find matches of aregular expressionin atext•
https://github.com/remi-dupre/enum-spanner-rs•
Work-in-progress•
Open questions / projects:• What aboutmemory usage? (we cannot keep the whole index)
• Output matchesin streaming? (problem: duplicates)
• Can we enumerateother notions of matches?
→ factors ofmaximal/minimal size
→ distinct matchingstrings
→ etc.
• Whichapplication domainsneed this?
• Are there goodbenchmarks?
Implementation (ongoing internship by Rémi Dupré)
•
Prototype to find matches of aregular expressionin atext•
https://github.com/remi-dupre/enum-spanner-rs•
Work-in-progress•
Open questions / projects:• What aboutmemory usage? (we cannot keep the whole index)
• Output matchesin streaming? (problem: duplicates)
• Can we enumerateother notions of matches?
→ factors ofmaximal/minimal size
→ distinct matchingstrings
→ etc.
• Whichapplication domainsneed this?
• Are there goodbenchmarks?