Pattern matching on trees
•
ThedataTis no longertextbut is now atree:<body>1
<section>3
<p>5
<img>7
<img>6
<h2>4
<div>2
•
ThepatternPasks about thestructureof the tree: Is thereα:
anh2header and
β:
animagein the same section?
•
Results:hα:4,β :6i,hα:4,β:7i18/36
Pattern matching on trees
•
ThedataTis no longertextbut is now atree:<body>1
<section>3
<p>5
<img>7
<img>6
<h2>4
<div>2
•
ThepatternPasks about thestructureof the tree:Is there
α:
anh2header and
β:
animagein the same section?
•
Results:hα:4,β :6i,hα:4,β:7i18/36
Pattern matching on trees
•
ThedataTis no longertextbut is now atree:<body>1
<section>3
<p>5
<img>7
<img>6
<h2>4
<div>2
•
ThepatternPasks about thestructureof the tree:Is there
α:
anh2header and
β:
animagein the same section?
•
Results:hα:4,β :6i,hα:4,β:7i
18/36
Pattern matching on trees
•
ThedataTis no longertextbut is now atree:<body>1
<section>3
<p>5
<img>7
<img>6
<h2>4
<div>2
•
ThepatternPasks about thestructureof the tree:Is thereα: anh2header andβ: animagein the same section?
•
Results:hα:4,β :6i,hα:4,β:7i
18/36
Pattern matching on trees
•
ThedataTis no longertextbut is now atree:<body>1
<section>3
<p>5
<img>7
<img>6
<h2>4
<div>2
•
ThepatternPasks about thestructureof the tree:Is thereα: anh2header andβ: animagein the same section?
•
Results:hα:4,β :6i,hα:4,β:7i18/36
Definitions and Results on Trees
•
Tree patterns can be written as atree automatonwith captures•
Like fortext, we can enumerate the matches oftree automata... Theorem [Bagan, 2006]We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:
• PreprocessinglinearinT
(data)
• DelayconstantinT
(data)
•
Again, this is only indata complexity!•
Weconjecturethe following bounds for this task (ongoing work): Conjecture• PreprocessinglinearinT(data) andpolynomialinAandT(combined)
• DelayconstantinT(data) andpolynomialinAandT(combined)
19/36
Definitions and Results on Trees
•
Tree patterns can be written as atree automatonwith captures•
Like fortext, we can enumerate the matches oftree automata...Theorem [Bagan, 2006]
We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:
• PreprocessinglinearinT
(data)
• DelayconstantinT
(data)
•
Again, this is only indata complexity!•
Weconjecturethe following bounds for this task (ongoing work): Conjecture• PreprocessinglinearinT(data) andpolynomialinAandT(combined)
• DelayconstantinT(data) andpolynomialinAandT(combined)
19/36
Definitions and Results on Trees
•
Tree patterns can be written as atree automatonwith captures•
Like fortext, we can enumerate the matches oftree automata...Theorem [Bagan, 2006]
We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:
• PreprocessinglinearinT(data)
• DelayconstantinT(data)
•
Again, this is only indata complexity!•
Weconjecturethe following bounds for this task (ongoing work): Conjecture• PreprocessinglinearinT(data) andpolynomialinAandT(combined)
• DelayconstantinT(data) andpolynomialinAandT(combined)
19/36
Definitions and Results on Trees
•
Tree patterns can be written as atree automatonwith captures•
Like fortext, we can enumerate the matches oftree automata...Theorem [Bagan, 2006]
We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:
• PreprocessinglinearinT(data)
• DelayconstantinT(data)
•
Again, this is only indata complexity!•
Weconjecturethe following bounds for this task (ongoing work): Conjecture• PreprocessinglinearinT(data) andpolynomialinAandT(combined)
• DelayconstantinT(data) andpolynomialinAandT(combined)
19/36
Definitions and Results on Trees
•
Tree patterns can be written as atree automatonwith captures•
Like fortext, we can enumerate the matches oftree automata...Theorem [Bagan, 2006]
We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:
• PreprocessinglinearinT(data)
• DelayconstantinT(data)
•
Again, this is only indata complexity!•
Weconjecturethe following bounds for this task (ongoing work):Conjecture
• PreprocessinglinearinT(data) andpolynomialinAandT(combined)
• DelayconstantinT(data) andpolynomialinAandT(combined)
19/36
Proof idea for trees: structure
Similar structureto the previous proof, but with acircuit:
•
Preprocessing:Compute acircuit representationof the answers•
Enumeration:Apply ageneric algorithmon the circuitTree
Phase 1:
Preprocessing Data structure
Phase 2:
Enumeration
hα:4,β:6i, hα:4,β:7i
Results
∃ssection(s)∧
s α∧s β∧
h2(α)∧img(β)
Pattern
20/36
Proof idea for trees: structure
Similar structureto the previous proof, but with acircuit:
•
Preprocessing:Compute acircuit representationof the answers•
Enumeration:Apply ageneric algorithmon the circuit TreeProof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuits
Aset circuitrepresents aset of answersto a patternP(α,β)
•
Singletonα:6→“the variableαis mapped to node6”•
Tuplehα:4,β:6i: tuple of singletons•
The circuit captures asetof tuples, e.g.,hα:4,β:6i,hα:4,β:7i
Three kinds ofset-valued gates:
•
Variable gate α:4 :→ captures hα:4i
•
Union gate ∪ :→ union of sets of tuples
•
Product gate × :→ relational product
21/36
Proof idea for trees: set circuit construction given atreeT, we can build inO(|T| × |A|)aset circuitcapturing exactly the set of tuples{hα1 :n1, . . . ,αk:nkiin the output ofAonT
22/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: set circuit construction (details)
•
Automaton:“Select all node pairs(α,β)”•
States:{∅,α,β,αβ}n
α:n
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
∪ ∪ ∪ ∪
∅ α β αβ
×
×
23/36
Proof idea for trees: enumeration on set circuits
Given a set circuitsatisfying some conditions, we can enumerate all tuples that it captures with linear preprocessing and constant delay E.g., for
hα:4,β:6i,hα:4,β:7i : enumeratehα:4,β:6ithenhα:4,β:7i
24/36
Proof idea for trees: general enumeration approach
→ Enumerate the setT(g)captured by each gateg
→ Do it bytop-down inductionon the circuit
Base case: variable α:n : enumeratehα:niand stop
∪-gate
∪
g1 g2
Concatenation:enumerateT(g1) and then enumerateT(g2)
×-gate
×
g1 g2
Lexicographic product: for everyt1inT(g1):
for everyt2 inT(g2): outputt1+t2
25/36
Proof idea for trees: general enumeration approach
→ Enumerate the setT(g)captured by each gateg
→ Do it bytop-down inductionon the circuit
Base case: variable α:n :
enumeratehα:niand stop
∪-gate
∪
g1 g2
Concatenation:enumerateT(g1) and then enumerateT(g2)
×-gate
×
g1 g2
Lexicographic product: for everyt1inT(g1):
for everyt2 inT(g2): outputt1+t2
25/36
Proof idea for trees: general enumeration approach
→ Enumerate the setT(g)captured by each gateg
→ Do it bytop-down inductionon the circuit
Base case: variable α:n : enumeratehα:niand stop
∪-gate
∪
g1 g2
Concatenation:enumerateT(g1) and then enumerateT(g2)
×-gate
×
g1 g2
Lexicographic product: for everyt1inT(g1):
for everyt2 inT(g2): outputt1+t2
25/36
Proof idea for trees: general enumeration approach
→ Enumerate the setT(g)captured by each gateg
→ Do it bytop-down inductionon the circuit
Base case: variable α:n : enumeratehα:niand stop
∪-gate
∪
g1 g2
Concatenation:enumerateT(g1) and then enumerateT(g2)
×-gate
×
g1 g2
Lexicographic product: for everyt1inT(g1):
for everyt2 inT(g2): outputt1+t2
25/36
Proof idea for trees: general enumeration approach
→ Enumerate the setT(g)captured by each gateg
→ Do it bytop-down inductionon the circuit
Base case: variable α:n : enumeratehα:niand stop
∪-gate
∪
g1 g2
Concatenation:enumerateT(g1) and then enumerateT(g2)
×-gate
×
g1 g2
Lexicographic product:
for everyt1inT(g1): for everyt2 inT(g2):
outputt1+t2
25/36
Proof idea for trees: circuit conditions
Enumeration relies on someconditionson the input circuit (d-DNNF):
•
∪ are alldeterministic:For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)
→ Avoidsduplicate tuples
•
× are alldecomposable:For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2
→ Avoidsduplicate singletons
×
α:4
β:6
∪
β:7
•
Also an additionalupwards-determinismcondition•
Our circuit satisfies these thanks toautomaton determinism26/36
Proof idea for trees: circuit conditions
Enumeration relies on someconditionson the input circuit (d-DNNF):
•
∪ are alldeterministic:For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)
→ Avoidsduplicate tuples
•
× are alldecomposable:For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2
→ Avoidsduplicate singletons
×
α:4
β:6
∪
β:7
•
Also an additionalupwards-determinismcondition•
Our circuit satisfies these thanks toautomaton determinism26/36
Proof idea for trees: circuit conditions
Enumeration relies on someconditionson the input circuit (d-DNNF):
•
∪ are alldeterministic:For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)
→ Avoidsduplicate tuples
•
× are alldecomposable:For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2
→ Avoidsduplicate singletons
×
α:4
β:6
∪
β:7
•
Also an additionalupwards-determinismcondition•
Our circuit satisfies these thanks toautomaton determinism26/36
Proof idea for trees: circuit conditions
Enumeration relies on someconditionson the input circuit (d-DNNF):
•
∪ are alldeterministic:For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)
→ Avoidsduplicate tuples
•
× are alldecomposable:For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2
→ Avoidsduplicate singletons
×
α:4
β:6
∪
β:7
•
Also an additionalupwards-determinismcondition•
Our circuit satisfies these thanks toautomaton determinism26/36
Proof idea for trees: circuit conditions
Enumeration relies on someconditionson the input circuit (d-DNNF):
•
∪ are alldeterministic:For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)
→ Avoidsduplicate tuples
•
× are alldecomposable:For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2
→ Avoidsduplicate singletons
×
α:4
β:6
∪
β:7
•
Also an additionalupwards-determinismcondition•
Our circuit satisfies these thanks toautomaton determinism 26/36Proof idea for trees: enumeration subtleties
•
We must not waste time in gates capturing∅→ Labelthem during the preprocessing
•
We must not waste time because of gates capturing hi→ Homogenizationto set them aside
•
We must not waste time inhierarchies of∪-gates→ Precompute areachability index(usesupwards-determinism)
27/36
Proof idea for trees: enumeration subtleties
•
We must not waste time in gates capturing∅→ Labelthem during the preprocessing
•
We must not waste time because of gates capturing hi→ Homogenizationto set them aside
•
We must not waste time inhierarchies of∪-gates→ Precompute areachability index(usesupwards-determinism)
27/36
Proof idea for trees: enumeration subtleties
•
We must not waste time in gates capturing∅→ Labelthem during the preprocessing
•
We must not waste time because of gates capturing hi→ Homogenizationto set them aside
•
We must not waste time inhierarchies of∪-gates→ Precompute areachability index(usesupwards-determinism)
27/36
Proof idea for trees: enumeration subtleties
•
We must not waste time in gates capturing∅→ Labelthem during the preprocessing
•
We must not waste time because of gates capturing hi→ Homogenizationto set them aside
•
We must not waste time inhierarchies of∪-gates→ Precompute areachability index(usesupwards-determinism)
27/36
Proof idea for trees: enumeration subtleties
•
We must not waste time in gates capturing∅→ Labelthem during the preprocessing
•
We must not waste time because of gates capturing hi→ Homogenizationto set them aside
•
We must not waste time inhierarchies of∪-gates→ Precompute areachability index(usesupwards-determinism)
27/36
Proof idea for trees: enumeration subtleties
•
We must not waste time in gates capturing∅→ Labelthem during the preprocessing
•
We must not waste time because of gates capturing hi→ Homogenizationto set them aside
•
We must not waste time inhierarchies of∪-gates→ Precompute areachability index(usesupwards-determinism)
27/36