• Aucun résultat trouvé

Enumerating Pattern Matches in Words and Trees

N/A
N/A
Protected

Academic year: 2022

Partager "Enumerating Pattern Matches in Words and Trees"

Copied!
124
0
0

Texte intégral

(1)

Enumerating Pattern Matches in Words and Trees

Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3, Matthias Niewerth4 October 4th, 2018

1Télécom ParisTech

2CNRS CRIStAL

3CNRS CRIL

4Universität Bayreuth

1/18

(2)

Problem: Finding patterns in text

We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

We want to find apatternPin the textT:

Example: findemail addresses

• Write the pattern as aregular expression:

P:= + [a-z0-9.] @ [a-z0-9.] +

→ How to find the patternPefficiently in the textT?

2/18

(3)

Problem: Finding patterns in text

We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

We want to find apatternPin the textT:

Example: findemail addresses

• Write the pattern as aregular expression:

P:= + [a-z0-9.] @ [a-z0-9.] +

→ How to find the patternPefficiently in the textT?

2/18

(4)

Problem: Finding patterns in text

We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

We want to find apatternPin the textT:

Example: findemail addresses

• Write the pattern as aregular expression:

P:= + [a-z0-9.] @ [a-z0-9.] +

→ How to find the patternPefficiently in the textT?

2/18

(5)

Problem: Finding patterns in text

We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.

French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.

More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

We want to find apatternPin the textT:

Example: findemail addresses

• Write the pattern as aregular expression:

P:= + [a-z0-9.] @ [a-z0-9.] +

→ How to find the patternPefficiently in the textT?

2/18

(6)

Solution: automata

Convert the pattern from aregular expressionto anautomaton

P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

1

start 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

How efficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(7)

Solution: automata

Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

1

start 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

How efficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(8)

Solution: automata

Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

start 1 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

How efficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(9)

Solution: automata

Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

start 1 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

How efficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(10)

Solution: automata

Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

start 1 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

Howefficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(11)

Solution: automata

Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

start 1 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

How efficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(12)

Solution: automata

Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.] @ [a-z0-9.]+

start 1 2 3 4

[a-z0-9.] [a-z0-9.]

@

Then, evaluate the automaton on thetextT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

How efficientis this?

Data complexityin the textT:linear, i.e.,O(|T|)

Combined complexityinTandP:polynomial

3/18

(13)

Actual problem: Extracting all patterns

Problem:This only tells usifthe pattern is in the text!

“YES”

We want toactually findall pattern matches!

Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.] @ [a-z0-9.] β ␣+

Semantics:amatchofPmapsαandβ topositionsofT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

→One match:hα:20,β :32i

4/18

(14)

Actual problem: Extracting all patterns

Problem:This only tells usifthe pattern is in the text!

“YES”

We want toactually findall pattern matches!

Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.] @ [a-z0-9.] β ␣+

Semantics:amatchofPmapsαandβ topositionsofT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

→One match:hα:20,β :32i

4/18

(15)

Actual problem: Extracting all patterns

Problem:This only tells usifthe pattern is in the text!

“YES”

We want toactually findall pattern matches!

Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.] @ [a-z0-9.] β ␣+

Semantics:amatchofPmapsαandβ topositionsofT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

→One match:hα:20,β :32i

4/18

(16)

Actual problem: Extracting all patterns

Problem:This only tells usifthe pattern is in the text!

“YES”

We want toactually findall pattern matches!

Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.] @ [a-z0-9.] β ␣+

Semantics:amatchofPmaps αandβ topositionsofT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

→One match:hα:20,β :32i

4/18

(17)

Actual problem: Extracting all patterns

Problem:This only tells usifthe pattern is in the text!

“YES”

We want toactually findall pattern matches!

Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.] @ [a-z0-9.] β ␣+

Semantics:amatchofPmaps αandβ topositionsofT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

→One match:hα:20,β :32i

4/18

(18)

Actual problem: Extracting all patterns

Problem:This only tells usifthe pattern is in the text!

“YES”

We want toactually findall pattern matches!

Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.] @ [a-z0-9.] β ␣+

Semantics:amatchofPmaps αandβ topositionsofT

... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...

→One match:hα:20,β :32i

4/18

(19)

Formal problem statement

Problem description:

Input:

• AtextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

• ApatternPgiven as a regular expression with capture variables P:= + α [a-z0-9.] @ [a-z0-9.] β +

Output:the list ofmatchesofPonT :187,β:199i, ...

We measure thecomplexityof the problem:

• Indata complexity, as a function ofT

• Incombined complexity, as a function ofPandT

5/18

(20)

Formal problem statement

Problem description:

Input:

• AtextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

• ApatternPgiven as a regular expression with capture variables P:= + α [a-z0-9.] @ [a-z0-9.] β +

Output:the list ofmatchesofPonT :187,β:199i, ...

We measure thecomplexityof the problem:

• Indata complexity, as a function ofT

• Incombined complexity, as a function ofPandT

5/18

(21)

Formal problem statement

Problem description:

Input:

• AtextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

• ApatternPgiven as a regular expression with capture variables P:= + α [a-z0-9.] @ [a-z0-9.] β +

Output:the list ofmatchesofPonT :187,β:199i, ...

We measure thecomplexityof the problem:

• Indata complexity, as a function ofT

• Incombined complexity, as a function ofPandT

5/18

(22)

Formal problem statement

Problem description:

Input:

• AtextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

• ApatternPgiven as a regular expression with capture variables P:= + α [a-z0-9.] @ [a-z0-9.] β +

Output:the list ofmatchesofPonT :187,β:199i, ...

We measure thecomplexityof the problem:

• Indata complexity, as a function ofT

• Incombined complexity, as a function ofPandT

5/18

(23)

Formal problem statement

Problem description:

Input:

• AtextT

Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...

• ApatternPgiven as a regular expression with capture variables P:= + α [a-z0-9.] @ [a-z0-9.] β +

Output:the list ofmatchesofPonT :187,β:199i, ...

We measure thecomplexityof the problem:

• Indata complexity, as a function ofT

• Incombined complexity, as a function ofPandT

5/18

(24)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(25)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(26)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

α

β

l

α

β o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(27)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

α

β

l

αβ

o

α

β l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(28)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

α

β

l

αβ

o

αβ

l

α

β

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(29)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

α

β l α

β

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(30)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l αβ o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(31)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l α

β

o

α

β l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(32)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l α

β

o

αβ

l

α

β

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(33)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

α

β l

αβ

o α

β

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(34)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

α

β o α

β

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(35)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o αβ l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(36)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o α

β

l

α

β

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(37)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

α

β l

αβ

o

αβ

l α

β

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(38)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

α

β o

αβ

l α

β

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(39)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

α

β l α

β

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(40)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(41)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity...

O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(42)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(43)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(44)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(45)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(46)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(47)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(48)

Measuring the complexity

Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the pattern

αβ

l

αβ

o

αβ

l

αβ

Forkcapture variables,data complexity... O(|T|k+1)

Hope:IfTis big, we want data complexity to be inO(|T|)

Challenge:This isimpossible, there can betoo many matches:

• Consider thetextT:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

• Consider theregexp with capturesP:= α a β

• Thenumber of matchesisO(|T|2)

→ We need adifferent wayto measure complexity

6/18

(49)

Enumeration algorithms

Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly

→ Formalization:enumeration algorithms

7/18

(50)

Enumeration algorithms

Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly

→ Formalization:enumeration algorithms

7/18

(51)

Enumeration algorithms

Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly

→ Formalization:enumeration algorithms

7/18

(52)

Enumeration algorithms

Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly

→ Formalization:enumeration algorithms

7/18

(53)

Enumeration algorithms

Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly

→ Formalization:enumeration algorithms

7/18

(54)

Enumeration algorithms

Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly

→ Formalization:enumeration algorithms

7/18

(55)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1: Preprocessing

Data structure

Phase 2: Enumeration

hα:42,β :57i, hα:1337,β:1351i

Results State

8/18

(56)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2: Enumeration

hα:42,β :57i, hα:1337,β:1351i

Results State

8/18

(57)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2: Enumeration

hα:42,β :57i, hα:1337,β:1351i

Results State

8/18

(58)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2:

Enumeration

hα:42,β :57i, hα:1337,β:1351i

Results State

8/18

(59)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2:

Enumeration

hα :42,β :57i,

hα:1337,β:1351i

Results

State

8/18

(60)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2:

Enumeration

hα :42,β :57i,

hα:1337,β:1351i

Results State

0011

8/18

(61)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2:

Enumeration

hα :42,β :57i,

hα:1337,β:1351i

Results State

0011

8/18

(62)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2:

Enumeration

hα :42,β :57i, hα:1337,β:1351i

Results State

8/18

(63)

Formalizing enumeration algorithms

Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...

TextT

+ α [a-z0-9.] @

[a-z0-9.] β ␣+ PatternP

Phase 1:

Preprocessing

Data structure

Phase 2:

Enumeration

hα :42,β :57i, hα:1337,β:1351i

Results State

Two waysto measure performance:

Total timefor phase 1

Delay between two resultsin phase 2

... incombinedanddata complexity 8/18

Références

Documents relatifs

• Extending the results from text to trees • Supporting updates on the input data • Understanding the connections with circuit classes • Enumerating results in a relevant

• The input data can be modified after the preprocessing • If this happen, we must rerun the preprocessing from scratch 18/22... Phase 1: Preprocessing Data structure

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex

Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science office C201-4 in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex

Combine this with the inverted index you built yesterday for the Simple English dataset, so that queries over this dataset use a combination of tf-idf and PageRank. Combine this

There is no fixed list of assignment for this lab session, but focus on connecting the systems produced in the first four labs: How to use PageRank to improve the results of

There is no fixed list of assignment for this lab session, but focus on connecting the systems produced in the first three labs: How to use PageRank to improve the results of

We use the following scheme for class hierarchies: An abstract subclass of Robot redefines the start method in order to indicate how the function processURL should be called on