• Aucun résultat trouvé

Enumerating MSO Query Results on Trees and Words

N/A
N/A
Protected

Academic year: 2022

Partager "Enumerating MSO Query Results on Trees and Words"

Copied!
88
0
0

Texte intégral

(1)

A Circuit-Based Approach to Efficient Enumeration:

Enumerating MSO Query Results on Trees and Words

Antoine Amarilli1, Pierre Bourhis2, Louis Jachiet2, Stefan Mengel3, Matthias Niewerth4

May 20, 2019

1Télécom ParisTech

2CNRS CRIStAL

(2)

Roadmap of this talk

Stefan has presented enumeration ford-DNNF set circuits

This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate

Using the previous talkwe can reprove the result: Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]

We can enumerate the results of anautomatonon a tree with:

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton

Next talk:how to supportupdates(and provenew results)

(3)

Roadmap of this talk

Stefan has presented enumeration ford-DNNF set circuits

This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate

Using the previous talkwe can reprove the result: Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]

We can enumerate the results of anautomatonon a tree with:

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton

Next talk:how to supportupdates(and provenew results)

(4)

Roadmap of this talk

Stefan has presented enumeration ford-DNNF set circuits

This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate

Using the previous talkwe can reprove the result:

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]

We can enumerate the results of anautomatonon a tree with:

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton

Next talk:how to supportupdates(and provenew results)

(5)

Roadmap of this talk

Stefan has presented enumeration ford-DNNF set circuits

This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate

Using the previous talkwe can reprove the result:

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]

We can enumerate the results of anautomatonon a tree with:

Preprocessinglinearin the tree andpolynomialin the automaton

Next talk:how to supportupdates(and provenew results)

(6)

Roadmap of this talk

Stefan has presented enumeration ford-DNNF set circuits

This talk:introduce MSO query evaluation on trees and build aset circuitthat represents the answers to enumerate

Using the previous talkwe can reprove the result:

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Also:new results oncombined complexity: Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]

We can enumerate the results of anautomatonon a tree with:

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton

(7)

Boolean MSO on trees

(8)

Boolean query evaluation on trees

Data: a treeTwhere nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeTsatisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(9)

Boolean query evaluation on trees

Data: a treeTwhere nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeTsatisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(10)

Boolean query evaluation on trees

Data: a treeTwhere nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeTsatisfies the queryQ

Computational complexityas a function ofT (the queryQisfixed)

(11)

Boolean query evaluation on trees

Data: a treeTwhere nodes have a color from an alphabet

?

QueryQ: asentencein monadic

second-order logic (MSO)

P (x)means “xis blue”

xymeans “xis the parent ofy

“Is there both a pink and a blue node?”

∃x y P (x)∧P (y)

i

Result: TRUE/FALSE indicating if the treeTsatisfies the queryQ

Computational complexityas a function ofT

(12)

Monadic second-order logic (MSO)

P (x)means “xis blue”; alsoP (x),P (x)

xymeans “xis the parent ofy

Propositional logic:formulas withAND∧,OR∨,NOT¬

P (x)P (y)means“Nodexis pink and nodeyis blue”

First-order logic:addsexistential quantifier∃and universal quantifier∀

∃x y P (x)P (y)means“There is both a pink and a blue node”

Monadic second-order logic (MSO):addsquantifiers over sets

∃S∀x S(x)means“there is a setScontaining every elementx”

• Can expresstransitive closurexy, i.e.,“xis an ancestor ofy”

∀x P (x)⇒ ∃y P (y)xy

means“There is a blue node below every pink node”

(13)

Monadic second-order logic (MSO)

P (x)means “xis blue”; alsoP (x),P (x)

xymeans “xis the parent ofy

Propositional logic:formulas withAND∧,OR∨,NOT¬

P (x)P (y)means“Nodexis pink and nodeyis blue”

First-order logic:addsexistential quantifier∃and universal quantifier∀

∃x y P (x)P (y)means“There is both a pink and a blue node”

Monadic second-order logic (MSO):addsquantifiers over sets

∃S∀x S(x)means“there is a setScontaining every elementx”

• Can expresstransitive closurexy, i.e.,“xis an ancestor ofy”

∀x P (x)⇒ ∃y P (y)xy

means“There is a blue node below every pink node”

(14)

Monadic second-order logic (MSO)

P (x)means “xis blue”; alsoP (x),P (x)

xymeans “xis the parent ofy

Propositional logic:formulas withAND∧,OR∨,NOT¬

P (x)P (y)means“Nodexis pink and nodeyis blue”

First-order logic:addsexistential quantifier∃and universal quantifier∀

∃x y P (x)P (y)means“There is both a pink and a blue node”

Monadic second-order logic (MSO):addsquantifiers over sets

∃S∀x S(x)means“there is a setScontaining every elementx”

• Can expresstransitive closurexy, i.e.,“xis an ancestor ofy”

∀x P (x)⇒ ∃y P (y)xy

means“There is a blue node below every pink node”

(15)

Monadic second-order logic (MSO)

P (x)means “xis blue”; alsoP (x),P (x)

xymeans “xis the parent ofy

Propositional logic:formulas withAND∧,OR∨,NOT¬

P (x)P (y)means“Nodexis pink and nodeyis blue”

First-order logic:addsexistential quantifier∃and universal quantifier∀

∃x y P (x)P (y)means“There is both a pink and a blue node”

Monadic second-order logic (MSO):addsquantifiers over sets

∃S∀x S(x)means“there is a setScontaining every elementx”

• Can expresstransitive closurexy, i.e.,“xis an ancestor ofy”

(16)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

(17)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

(18)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

(19)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

(20)

Tree automata

Tree alphabet:

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

(21)

Tree automata

Tree alphabet:

B

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples): P

P

> B P

(22)

Tree automata

Tree alphabet:

B

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

(23)

Tree automata

Tree alphabet:

B

B

P

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

(24)

Tree automata

Tree alphabet:

>

B

B

P

P

Bottom-up deterministictree automaton

“Is there both a pink and a blue node?”

States:{⊥,B,P,>}

Final states: {>}

Initial function: ⊥ P B

Transitions(examples):

P P

>

B P

(25)

Boolean MSO query evaluation via automata

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees

Given a Boolean MSO query, we can compute a tree automaton that accepts precisely the trees on which the query holds

Complexity (in the query) is generally nonelementary Corollary

Evaluating a Boolean MSO query on a tree is inlinear timein the tree

(26)

Boolean MSO query evaluation via automata

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees

Given a Boolean MSO query, we can compute a tree automaton that accepts precisely the trees on which the query holds

Complexity (in the query) is generally nonelementary

Corollary

Evaluating a Boolean MSO query on a tree is inlinear timein the tree

(27)

Boolean MSO query evaluation via automata

Theorem [Thatcher and Wright, 1968]

MSOandtree automatahave the sameexpressive poweron trees

Given a Boolean MSO query, we can compute a tree automaton that accepts precisely the trees on which the query holds

Complexity (in the query) is generally nonelementary Corollary

Evaluating a Boolean MSO query on a tree is inlinear timein the tree

(28)

Set Circuits for

Non-Boolean MSO Queries

(29)

Overall idea

Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn

Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds

→ Addspecial nodes: for each nodenand variablexi, add a nodeniwhich is coloredrediffxi is the noden

n n0 n00

n

n0 n00 n1 n2

Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi

This can be done inlinear timein the input tree

Remark: same construction for freesecond-order variables

(30)

Overall idea

Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn

Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds

→ Addspecial nodes: for each nodenand variablexi, add a nodeniwhich is coloredrediffxi is the noden

n n0 n00

n

n0 n00 n1 n2

Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi

This can be done inlinear timein the input tree

Remark: same construction for freesecond-order variables

(31)

Overall idea

Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn

Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds

→ Addspecial nodes: for each noden and variablexi, add a nodeniwhich is coloredrediffxi is the noden

n n0 n00

n

n0 n00 n1 n2

Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi

This can be done inlinear timein the input tree

Remark: same construction for freesecond-order variables

(32)

Overall idea

Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn

Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds

→ Addspecial nodes: for each noden and variablexi, add a nodeniwhich is coloredrediffxi is the noden

n n0 n00

n

n0 n00 n1 n2

Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi

This can be done inlinear timein the input tree

Remark: same construction for freesecond-order variables

(33)

Overall idea

Query:Q(x1, . . . ,xn)withfree variablesx1, . . . ,xn

Goal:find all tuplesa1, . . . ,ansuch thatQ(a1, . . . ,an)holds

→ Addspecial nodes: for each noden and variablexi, add a nodeniwhich is coloredrediffxi is the noden

n n0 n00

n

n0 n00 n1 n2

Rewrite the query to aBoolean querywhich uses the new nodesni to read the valuation ofxi

This can be done inlinear timein the input tree

(34)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(35)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels

Valuation: {2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(36)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(37)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {27→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(38)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(39)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

(40)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,3,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

The tree automatonAaccepts

(41)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {27→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

The tree automatonArejects

(42)

Uncertain trees

Now: Boolean query on a tree where the color of nodes isuncertain 1

5

6 7 2

4 3

Avaluationof a tree decides whether to keep(1) ordiscard(0) node labels Valuation: {2,77→1, ∗ 7→0}

A:“Is there both a pink and a blue node?”

The tree automatonAaccepts

(43)

Set circuit 1

5 6 7 2

4 3

Set circuit:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.

Query:Is there both a pink and a blue node?

×

∪ 7

2 3

×

×

× ×

∪ ∪ ∪ ∪

> 1 > 4 > 5 > 6

(44)

Set circuit 1

5 6 7 2

4 3

Set circuit:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.

Query:Is there both a pink and a blue node?

×

∪ 7

2 3

×

×

× ×

∪ ∪ ∪ ∪

> 1 > 4 > 5 > 6

(45)

Set circuit 1

5 6 7 2

4 3

Set circuit:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.

Query:Is there both a pink and a blue node?

×

∪ 7

2 3

×

×

× ×

∪ ∪ ∪ ∪

> 1 > 4 > 5 > 6

(46)

Set circuit 1

5 6 7 2

4 3

Set circuit:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.

Query:Is there both a pink and a blue node?

×

∪ 7

2 3

×

×

× ×

∪ ∪ ∪ ∪

> 1 > 4 > 5 > 6

(47)

Set circuit 1

5 6 7 2

4 3

Set circuit:

Tree automatonA, uncertain treeT, circuitC

Variable gatesofC: nodes ofT

Condition:Letνbe a valuation ofT, thenA acceptsν(T)iff the setS(g0)of the output gateg0 contains{g∈C|ν(g) =1}.

Query:Is there both a pink and a blue node?

×

∪ 7

2 3

×

×

× ×

∪ ∪ ∪ ∪

(48)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

> P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(49)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(50)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(51)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(52)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(53)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(54)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

∪ ∪ ∪ ∪

⊥ B P >

×

×

(55)

Building provenance circuits on trees

Theorem

For any bottom-uptree automatonAand inputtreeT, we can build ad-DNNF set circuitofAonTinO(|A| × |T|)

Alphabet:

Automaton:“Is there both a pink and a blue node?”

States:

{⊥,B,P,>}

Final: {>}

Transitions:

>

P

P P

n

n

∪ ∪ ∪ ∪

⊥ B P >

×

×

(56)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue

Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results: X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(57)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y)

Data: 1

2 3

Results: X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(58)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results: X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(59)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results:

X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(60)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results:

X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(61)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results:

X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(62)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results:

X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(63)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results:

X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

Semi-open question:what about memory usage?

(64)

Circuits as factorized representations of query results

→ Theset circuitofQis now afactorized representation which describes all the tuples that makeQtrue Example query:

Q(X1,X2) :P (x)∧P (y) Data:

1

2 3

Results:

X1 X2

1 2

1 3

Provenance circuit:

× X1(1) ∪

X2(2) X2(3)

{X2(2),X2(3)}

{(X1(1),X2(2)),(X1(1),X2(3))}

Theorem [Bagan, 2006, Kazana and Segoufin, 2013]

We can enumerate the answers of MSO queries on trees with linear-time preprocessing and constant delay.

(65)

Application to

Pattern Matching in Texts

(66)

Problem statement: Pattern matching in texts Data:atextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

?

Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.] @ [a-z0-9.]

i

Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . .

Goal:

bevery efficientinT (constant-delay)

bereasonably efficientinP(polynomial-time)

(67)

Problem statement: Pattern matching in texts Data:atextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

?

Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.] @ [a-z0-9.]

i

Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . .

Goal:

bevery efficientinT (constant-delay)

bereasonably efficientinP(polynomial-time)

(68)

Problem statement: Pattern matching in texts Data:atextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

?

Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.] @ [a-z0-9.]

i

Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . .

Goal:

bevery efficientinT (constant-delay)

bereasonably efficientinP(polynomial-time)

(69)

Problem statement: Pattern matching in texts Data:atextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

test@example.com More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

?

Query:apatternPgiven as a regular expression P:= ␣ [a-z0-9.] @ [a-z0-9.]

i

Output:the list ofsubstringsofT that matchP: [186,200i, [483,500i, . . . Goal:

bevery efficientinT (constant-delay)

(70)

Reducing to MSO

Atextis just atreewith a simpler shape

Aregular expression patterncan be expressed in MSO

More generally: regular expressions withvariables

Example:P:= α a β b γ

Translate to aword automaton(withcapture variables)

1 2 3 4

• α

a β

b γ

→ The MSO result implies: Theorem [Florenzano et al., 2018]

We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay

→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable

(71)

Reducing to MSO

Atextis just atreewith a simpler shape

Aregular expression patterncan be expressed in MSO

More generally: regular expressions withvariables

Example:P:= α a β b γ

Translate to aword automaton(withcapture variables)

1 2 3 4

• α

a β

b γ

→ The MSO result implies: Theorem [Florenzano et al., 2018]

We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay

→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable

(72)

Reducing to MSO

Atextis just atreewith a simpler shape

Aregular expression patterncan be expressed in MSO

More generally: regular expressions withvariables

Example:P:= α a β b γ

Translate to aword automaton(withcapture variables)

1 2 3 4

• α

a β

b γ

→ The MSO result implies: Theorem [Florenzano et al., 2018]

We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay

→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable

(73)

Reducing to MSO

Atextis just atreewith a simpler shape

Aregular expression patterncan be expressed in MSO

More generally: regular expressions withvariables

Example:P:= α a β b γ

Translate to aword automaton(withcapture variables)

1 2 3 4

• α

a β

b γ

→ The MSO result implies: Theorem [Florenzano et al., 2018]

We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay

→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable

(74)

Reducing to MSO

Atextis just atreewith a simpler shape

Aregular expression patterncan be expressed in MSO

More generally: regular expressions withvariables

Example:P:= α a β b γ

Translate to aword automaton(withcapture variables)

1 2 3 4

• α

a β

b γ

→ The MSO result implies:

Theorem [Florenzano et al., 2018]

We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay

→ The resulting set circuit is abinary decision diagram, i.e., each×-gate has only one input which is not a variable

(75)

Reducing to MSO

Atextis just atreewith a simpler shape

Aregular expression patterncan be expressed in MSO

More generally: regular expressions withvariables

Example:P:= α a β b γ

Translate to aword automaton(withcapture variables)

1 2 3 4

• α

a β

b γ

→ The MSO result implies:

Theorem [Florenzano et al., 2018]

We can enumerate all matches of a regular expression pattern on a tree with linear preprocessing and constant delay

(76)

Efficiency in the query

We have shown linear preprocessing and constant delay in thedata; but what about thequery?

For general MSO queries:nonelementarycomplexity

For regular expressions:exponential(determinization)

However: our methods adapt tonondeterministic automata

• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth

Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b] We can enumerate all matches of a nondeterministic tree automaton on a tree with

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton Corollary: enumeration for regular expression patterns on text

(77)

Efficiency in the query

We have shown linear preprocessing and constant delay in thedata; but what about thequery?

For general MSO queries:nonelementarycomplexity

For regular expressions:exponential(determinization)

However: our methods adapt tonondeterministic automata

• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth

Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b] We can enumerate all matches of a nondeterministic tree automaton on a tree with

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton Corollary: enumeration for regular expression patterns on text

(78)

Efficiency in the query

We have shown linear preprocessing and constant delay in thedata; but what about thequery?

For general MSO queries:nonelementarycomplexity

For regular expressions:exponential(determinization)

However: our methods adapt tonondeterministic automata

• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth

Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b] We can enumerate all matches of a nondeterministic tree automaton on a tree with

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton Corollary: enumeration for regular expression patterns on text

(79)

Efficiency in the query

We have shown linear preprocessing and constant delay in thedata; but what about thequery?

For general MSO queries:nonelementarycomplexity

For regular expressions:exponential(determinization)

However: our methods adapt tonondeterministic automata

• Constant-delay enumeration for set circuits without assuming determinismbut bounding some notion ofwidth

Theorem [Amarilli et al., 2019a, Amarilli et al., 2019b]

We can enumerate all matches of a nondeterministic tree automaton on a tree with

Preprocessinglinearin the tree andpolynomialin the automaton

Delayconstantin the tree andpolynomialin the automaton

(80)

Implementation (ongoing internship by Rémi Dupré)

Prototype to find matches of aregular expressionin atext

https://github.com/remi-dupre/enum-spanner-rs

Work-in-progress

Open questions / projects:

• What aboutmemory usage? (we cannot keep the whole index)

• Output matchesin streaming? (problem: duplicates)

• Can we enumerateother notions of matches?

factors ofmaximal/minimal size

distinct matchingstrings

etc.

• Whichapplication domainsneed this?

• Are there goodbenchmarks?

(81)

Implementation (ongoing internship by Rémi Dupré)

Prototype to find matches of aregular expressionin atext

https://github.com/remi-dupre/enum-spanner-rs

Work-in-progress

Open questions / projects:

• What aboutmemory usage? (we cannot keep the whole index)

• Output matchesin streaming? (problem: duplicates)

• Can we enumerateother notions of matches?

factors ofmaximal/minimal size

distinct matchingstrings

etc.

• Whichapplication domainsneed this?

• Are there goodbenchmarks?

Références

Documents relatifs

We give a simple proof of the “tree-width duality theorem” of Seymour and Thomas that the tree-width of a finite graph is exactly one less than the largest order of its brambles..

— If G is a strongly connected finite directed graph, the set T G of rooted directed spanning trees of G is naturally equipped with a structure of directed graph: there is a

J’ai déjà une forte expérience de mobilité avec plus d’un an de séjours de recherche à l’étranger (dans quatre pays) avant et pendant ma thèse, trois

Given a graph, it is possible to enumerate in incremental polynomial time:. - The proper tree decompositions - The

Main results Theorem Given a d-DNNF circuit C with a v-tree T, we can enumerate its satisfying assignments with preprocessing linear in |C| + |T| and delay linear in each

• Extending the results from text to trees • Supporting updates on the input data • Understanding the connections with circuit classes • Enumerating results in a relevant

Any countable and scattered linear ordering is the frontier of a straight, locally finite and complete

as a bottom-up tree automaton running on dags; and the language of a DA was defined as the set of dags that get accepted under (bottom-up) runs, defined in the usual sense;