• Aucun résultat trouvé

Extension: From Text to Trees

Dans le document Enumerating Pattern Matches in Words and Trees (Page 131-182)

Pattern matching on trees

ThedataTis no longertextbut is now atree:

<body>1

<section>3

<p>5

<img>7

<img>6

<h2>4

<div>2

ThepatternPasks about thestructureof the tree: Is there

α:

anh2header and

β:

animagein the same section?

Results::4,β :6i,:4,β:7i

18/36

Pattern matching on trees

ThedataTis no longertextbut is now atree:

<body>1

<section>3

<p>5

<img>7

<img>6

<h2>4

<div>2

ThepatternPasks about thestructureof the tree:

Is there

α:

anh2header and

β:

animagein the same section?

Results::4,β :6i,:4,β:7i

18/36

Pattern matching on trees

ThedataTis no longertextbut is now atree:

<body>1

<section>3

<p>5

<img>7

<img>6

<h2>4

<div>2

ThepatternPasks about thestructureof the tree:

Is there

α:

anh2header and

β:

animagein the same section?

Results:

hα:4,β :6i,hα:4,β:7i

18/36

Pattern matching on trees

ThedataTis no longertextbut is now atree:

<body>1

<section>3

<p>5

<img>7

<img>6

<h2>4

<div>2

ThepatternPasks about thestructureof the tree:

Is thereα: anh2header andβ: animagein the same section?

Results:

hα:4,β :6i,hα:4,β:7i

18/36

Pattern matching on trees

ThedataTis no longertextbut is now atree:

<body>1

<section>3

<p>5

<img>7

<img>6

<h2>4

<div>2

ThepatternPasks about thestructureof the tree:

Is thereα: anh2header andβ: animagein the same section?

Results::4,β :6i,:4,β:7i

18/36

Definitions and Results on Trees

Tree patterns can be written as atree automatonwith captures

Like fortext, we can enumerate the matches oftree automata... Theorem [Bagan, 2006]

We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:

PreprocessinglinearinT

(data)

DelayconstantinT

(data)

Again, this is only indata complexity!

Weconjecturethe following bounds for this task (ongoing work): Conjecture

PreprocessinglinearinT(data) andpolynomialinAandT(combined)

DelayconstantinT(data) andpolynomialinAandT(combined)

19/36

Definitions and Results on Trees

Tree patterns can be written as atree automatonwith captures

Like fortext, we can enumerate the matches oftree automata...

Theorem [Bagan, 2006]

We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:

PreprocessinglinearinT

(data)

DelayconstantinT

(data)

Again, this is only indata complexity!

Weconjecturethe following bounds for this task (ongoing work): Conjecture

PreprocessinglinearinT(data) andpolynomialinAandT(combined)

DelayconstantinT(data) andpolynomialinAandT(combined)

19/36

Definitions and Results on Trees

Tree patterns can be written as atree automatonwith captures

Like fortext, we can enumerate the matches oftree automata...

Theorem [Bagan, 2006]

We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:

PreprocessinglinearinT(data)

DelayconstantinT(data)

Again, this is only indata complexity!

Weconjecturethe following bounds for this task (ongoing work): Conjecture

PreprocessinglinearinT(data) andpolynomialinAandT(combined)

DelayconstantinT(data) andpolynomialinAandT(combined)

19/36

Definitions and Results on Trees

Tree patterns can be written as atree automatonwith captures

Like fortext, we can enumerate the matches oftree automata...

Theorem [Bagan, 2006]

We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:

PreprocessinglinearinT(data)

DelayconstantinT(data)

Again, this is only indata complexity!

Weconjecturethe following bounds for this task (ongoing work): Conjecture

PreprocessinglinearinT(data) andpolynomialinAandT(combined)

DelayconstantinT(data) andpolynomialinAandT(combined)

19/36

Definitions and Results on Trees

Tree patterns can be written as atree automatonwith captures

Like fortext, we can enumerate the matches oftree automata...

Theorem [Bagan, 2006]

We can find all matches on a treeTof a tree automatonA (with constantly many capture variables) with:

PreprocessinglinearinT(data)

DelayconstantinT(data)

Again, this is only indata complexity!

Weconjecturethe following bounds for this task (ongoing work):

Conjecture

PreprocessinglinearinT(data) andpolynomialinAandT(combined)

DelayconstantinT(data) andpolynomialinAandT(combined)

19/36

Proof idea for trees: structure

Similar structureto the previous proof, but with acircuit:

Preprocessing:Compute acircuit representationof the answers

Enumeration:Apply ageneric algorithmon the circuit

Tree

Phase 1:

Preprocessing Data structure

Phase 2:

Enumeration

:4,β:6i, :4,β:7i

Results

∃ssection(s)∧

s αs β∧

h2(α)img(β)

Pattern

20/36

Proof idea for trees: structure

Similar structureto the previous proof, but with acircuit:

Preprocessing:Compute acircuit representationof the answers

Enumeration:Apply ageneric algorithmon the circuit Tree

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuits

Aset circuitrepresents aset of answersto a patternP(α,β)

Singletonα:6“the variableαis mapped to node6”

Tuple:4,β:6i: tuple of singletons

The circuit captures asetof tuples, e.g.,

hα:4,β:6i,hα:4,β:7i

Three kinds ofset-valued gates:

Variable gate α:4 :

captures hα:4i

Union gate :

union of sets of tuples

Product gate × :

relational product

21/36

Proof idea for trees: set circuit construction given atreeT, we can build inO(|T| × |A|)aset circuitcapturing exactly the set of tuples{hα1 :n1, . . . ,αk:nkiin the output ofAonT

22/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: set circuit construction (details)

Automaton:“Select all node pairs(α,β)”

States:{∅,α,β,αβ}

n

α:n

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

∪ ∪ ∪ ∪

∅ α β αβ

×

×

23/36

Proof idea for trees: enumeration on set circuits

Given a set circuitsatisfying some conditions, we can enumerate all tuples that it captures with linear preprocessing and constant delay E.g., for

hα:4,β:6i,hα:4,β:7i : enumeratehα:4,β:6ithenhα:4,β:7i

24/36

Proof idea for trees: general enumeration approach

→ Enumerate the setT(g)captured by each gateg

→ Do it bytop-down inductionon the circuit

Base case: variable α:n : enumeratehα:niand stop

∪-gate

g1 g2

Concatenation:enumerateT(g1) and then enumerateT(g2)

×-gate

×

g1 g2

Lexicographic product: for everyt1inT(g1):

for everyt2 inT(g2): outputt1+t2

25/36

Proof idea for trees: general enumeration approach

→ Enumerate the setT(g)captured by each gateg

→ Do it bytop-down inductionon the circuit

Base case: variable α:n :

enumeratehα:niand stop

∪-gate

g1 g2

Concatenation:enumerateT(g1) and then enumerateT(g2)

×-gate

×

g1 g2

Lexicographic product: for everyt1inT(g1):

for everyt2 inT(g2): outputt1+t2

25/36

Proof idea for trees: general enumeration approach

→ Enumerate the setT(g)captured by each gateg

→ Do it bytop-down inductionon the circuit

Base case: variable α:n : enumeratehα:niand stop

∪-gate

g1 g2

Concatenation:enumerateT(g1) and then enumerateT(g2)

×-gate

×

g1 g2

Lexicographic product: for everyt1inT(g1):

for everyt2 inT(g2): outputt1+t2

25/36

Proof idea for trees: general enumeration approach

→ Enumerate the setT(g)captured by each gateg

→ Do it bytop-down inductionon the circuit

Base case: variable α:n : enumeratehα:niand stop

∪-gate

g1 g2

Concatenation:enumerateT(g1) and then enumerateT(g2)

×-gate

×

g1 g2

Lexicographic product: for everyt1inT(g1):

for everyt2 inT(g2): outputt1+t2

25/36

Proof idea for trees: general enumeration approach

→ Enumerate the setT(g)captured by each gateg

→ Do it bytop-down inductionon the circuit

Base case: variable α:n : enumeratehα:niand stop

∪-gate

g1 g2

Concatenation:enumerateT(g1) and then enumerateT(g2)

×-gate

×

g1 g2

Lexicographic product:

for everyt1inT(g1): for everyt2 inT(g2):

outputt1+t2

25/36

Proof idea for trees: circuit conditions

Enumeration relies on someconditionson the input circuit (d-DNNF):

are alldeterministic:

For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)

→ Avoidsduplicate tuples

× are alldecomposable:

For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2

→ Avoidsduplicate singletons

×

α:4

β:6

β:7

Also an additionalupwards-determinismcondition

Our circuit satisfies these thanks toautomaton determinism

26/36

Proof idea for trees: circuit conditions

Enumeration relies on someconditionson the input circuit (d-DNNF):

are alldeterministic:

For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)

→ Avoidsduplicate tuples

× are alldecomposable:

For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2

→ Avoidsduplicate singletons

×

α:4

β:6

β:7

Also an additionalupwards-determinismcondition

Our circuit satisfies these thanks toautomaton determinism

26/36

Proof idea for trees: circuit conditions

Enumeration relies on someconditionson the input circuit (d-DNNF):

are alldeterministic:

For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)

→ Avoidsduplicate tuples

× are alldecomposable:

For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2

→ Avoidsduplicate singletons

×

α:4

β:6

β:7

Also an additionalupwards-determinismcondition

Our circuit satisfies these thanks toautomaton determinism

26/36

Proof idea for trees: circuit conditions

Enumeration relies on someconditionson the input circuit (d-DNNF):

are alldeterministic:

For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)

→ Avoidsduplicate tuples

× are alldecomposable:

For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2

→ Avoidsduplicate singletons

×

α:4

β:6

β:7

Also an additionalupwards-determinismcondition

Our circuit satisfies these thanks toautomaton determinism

26/36

Proof idea for trees: circuit conditions

Enumeration relies on someconditionson the input circuit (d-DNNF):

are alldeterministic:

For any two inputsg1andg2of a∪-gate, the captured setsT(g1)andT(g2)aredisjoint (they have no tuple in common)

→ Avoidsduplicate tuples

× are alldecomposable:

For any two inputsg1andg2of a×-gate, no variable has a path to bothg1 andg2

→ Avoidsduplicate singletons

×

α:4

β:6

β:7

Also an additionalupwards-determinismcondition

Our circuit satisfies these thanks toautomaton determinism 26/36

Proof idea for trees: enumeration subtleties

We must not waste time in gates capturing∅

Labelthem during the preprocessing

We must not waste time because of gates capturing hi

Homogenizationto set them aside

We must not waste time inhierarchies of∪-gates

Precompute areachability index(usesupwards-determinism)

27/36

Proof idea for trees: enumeration subtleties

We must not waste time in gates capturing∅

Labelthem during the preprocessing

We must not waste time because of gates capturing hi

Homogenizationto set them aside

We must not waste time inhierarchies of∪-gates

Precompute areachability index(usesupwards-determinism)

27/36

Proof idea for trees: enumeration subtleties

We must not waste time in gates capturing∅

Labelthem during the preprocessing

We must not waste time because of gates capturing hi

Homogenizationto set them aside

We must not waste time inhierarchies of∪-gates

Precompute areachability index(usesupwards-determinism)

27/36

Proof idea for trees: enumeration subtleties

We must not waste time in gates capturing∅

Labelthem during the preprocessing

We must not waste time because of gates capturing hi

Homogenizationto set them aside

We must not waste time inhierarchies of∪-gates

Precompute areachability index(usesupwards-determinism)

27/36

Proof idea for trees: enumeration subtleties

We must not waste time in gates capturing∅

Labelthem during the preprocessing

We must not waste time because of gates capturing hi

Homogenizationto set them aside

We must not waste time inhierarchies of∪-gates

Precompute areachability index(usesupwards-determinism)

27/36

Proof idea for trees: enumeration subtleties

We must not waste time in gates capturing∅

Labelthem during the preprocessing

We must not waste time because of gates capturing hi

Homogenizationto set them aside

We must not waste time inhierarchies of∪-gates

Precompute areachability index(usesupwards-determinism)

27/36

Dans le document Enumerating Pattern Matches in Words and Trees (Page 131-182)

Documents relatifs