Enumerating Pattern Matches in Words and Trees
Antoine Amarilli1, Pierre Bourhis2, Stefan Mengel3, Matthias Niewerth4 October 4th, 2018
1Télécom ParisTech
2CNRS CRIStAL
3CNRS CRIL
4Universität Bayreuth
1/18
Problem: Finding patterns in text
•
We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
•
We want to find apatternPin the textT:→ Example: findemail addresses
• Write the pattern as aregular expression:
P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+
→ How to find the patternPefficiently in the textT?
2/18
Problem: Finding patterns in text
•
We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
•
We want to find apatternPin the textT:→ Example: findemail addresses
• Write the pattern as aregular expression:
P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+
→ How to find the patternPefficiently in the textT?
2/18
Problem: Finding patterns in text
•
We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
•
We want to find apatternPin the textT:→ Example: findemail addresses
• Write the pattern as aregular expression:
P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+
→ How to find the patternPefficiently in the textT?
2/18
Problem: Finding patterns in text
•
We have aAntoine Amarilli Description Name Antoine Amarilli.long textT: Handle: a3nm. Identity Born 1990-02-07.French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure.
More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
•
We want to find apatternPin the textT:→ Example: findemail addresses
• Write the pattern as aregular expression:
P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+
→ How to find the patternPefficiently in the textT?
2/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomatonP:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+
1
start 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
How• efficientis this?Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+1
start 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
How• efficientis this?Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+start 1 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
How• efficientis this?Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+start 1 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
How• efficientis this?Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+start 1 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
Howefficientis this?• Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+start 1 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
How• efficientis this?Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Solution: automata
•
Convert the pattern from aregular expressionto anautomaton P:= ␣+ [a-z0-9.]∗ @ [a-z0-9.]∗ ␣+start 1 2 3 4
[a-z0-9.] [a-z0-9.]
@
•
Then, evaluate the automaton on thetextT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
•
How• efficientis this?Data complexityin the textT:linear, i.e.,O(|T|)
• Combined complexityinTandP:polynomial
3/18
Actual problem: Extracting all patterns
•
Problem:This only tells usifthe pattern is in the text!→ “YES”
•
We want toactually findall pattern matches!•
Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+•
Semantics:amatchofPmapsαandβ topositionsofT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
→One match:hα:20,β :32i
4/18
Actual problem: Extracting all patterns
•
Problem:This only tells usifthe pattern is in the text!→ “YES”
•
We want toactually findall pattern matches!•
Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+•
Semantics:amatchofPmapsαandβ topositionsofT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
→One match:hα:20,β :32i
4/18
Actual problem: Extracting all patterns
•
Problem:This only tells usifthe pattern is in the text!→ “YES”
•
We want toactually findall pattern matches!•
Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+•
Semantics:amatchofPmapsαandβ topositionsofT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
→One match:hα:20,β :32i
4/18
Actual problem: Extracting all patterns
•
Problem:This only tells usifthe pattern is in the text!→ “YES”
•
We want toactually findall pattern matches!•
Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+•
Semantics:amatchofPmaps αandβ topositionsofT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
→One match:hα:20,β :32i
4/18
Actual problem: Extracting all patterns
•
Problem:This only tells usifthe pattern is in the text!→ “YES”
•
We want toactually findall pattern matches!•
Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+•
Semantics:amatchofPmaps αandβ topositionsofT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
→One match:hα:20,β :32i
4/18
Actual problem: Extracting all patterns
•
Problem:This only tells usifthe pattern is in the text!→ “YES”
•
We want toactually findall pattern matches!•
Write the patternPas a regular expression withcapture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+•
Semantics:amatchofPmaps αandβ topositionsofT... Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer ...
→One match:hα:20,β :32i
4/18
Formal problem statement
•
Problem description:• Input:
• AtextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• ApatternPgiven as a regular expression with capture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+
• Output:the list ofmatchesofPonT hα:187,β:199i, ...
•
We measure thecomplexityof the problem:• Indata complexity, as a function ofT
• Incombined complexity, as a function ofPandT
5/18
Formal problem statement
•
Problem description:• Input:
• AtextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• ApatternPgiven as a regular expression with capture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+
• Output:the list ofmatchesofPonT hα:187,β:199i, ...
•
We measure thecomplexityof the problem:• Indata complexity, as a function ofT
• Incombined complexity, as a function ofPandT
5/18
Formal problem statement
•
Problem description:• Input:
• AtextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• ApatternPgiven as a regular expression with capture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+
• Output:the list ofmatchesofPonT hα:187,β:199i, ...
•
We measure thecomplexityof the problem:• Indata complexity, as a function ofT
• Incombined complexity, as a function ofPandT
5/18
Formal problem statement
•
Problem description:• Input:
• AtextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• ApatternPgiven as a regular expression with capture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+
• Output:the list ofmatchesofPonT hα:187,β:199i, ...
•
We measure thecomplexityof the problem:• Indata complexity, as a function ofT
• Incombined complexity, as a function ofPandT
5/18
Formal problem statement
•
Problem description:• Input:
• AtextT
Antoine Amarilli Description Name Antoine Amarilli. Handle: a3nm. Identity Born 1990-02-07. French national. Appearance as of 2017. Auth OpenPGP. OpenId. Bitcoin. Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor of computer science (office C201-4) in the DIG team of Télécom ParisTech, 46 rue Barrault, F-75634 Paris Cedex 13, France. Studies PhD in computer science awarded by Télécom ParisTech on March 14, 2016. Former student of the École normale supérieure. More Résumé Location Other sites Blogging: a3nm.net/blog Git: a3nm.net/git ...
• ApatternPgiven as a regular expression with capture variables P:= ␣+ α [a-z0-9.]∗ @ [a-z0-9.]∗ β ␣+
• Output:the list ofmatchesofPonT hα:187,β:199i, ...
•
We measure thecomplexityof the problem:• Indata complexity, as a function ofT
• Incombined complexity, as a function ofPandT
5/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternα
β
l
α
β o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternα
β
l
αβ
o
α
β l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternα
β
l
αβ
o
αβ
l
α
β
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternα
β l α
β
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l αβ o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l α
β
o
α
β l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l α
β
o
αβ
l
α
β
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternα
β l
αβ
o α
β
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
α
β o α
β
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o αβ l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o α
β
l
α
β
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternα
β l
αβ
o
αβ
l α
β
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
α
β o
αβ
l α
β
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
α
β l α
β
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity...
O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Measuring the complexity
•
Naive algorithm:Considerall waysto assign capture variables andtestfor each of them if it satisfies the patternαβ
l
αβ
o
αβ
l
αβ
→ Forkcapture variables,data complexity... O(|T|k+1)
•
Hope:IfTis big, we want data complexity to be inO(|T|)•
Challenge:This isimpossible, there can betoo many matches:• Consider thetextT:
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
• Consider theregexp with capturesP:= α a∗ β
• Thenumber of matchesisO(|T|2)
→ We need adifferent wayto measure complexity
6/18
Enumeration algorithms
Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly
→ Formalization:enumeration algorithms
7/18
Enumeration algorithms
Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly
→ Formalization:enumeration algorithms
7/18
Enumeration algorithms
Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly
→ Formalization:enumeration algorithms
7/18
Enumeration algorithms
Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly
→ Formalization:enumeration algorithms
7/18
Enumeration algorithms
Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly
→ Formalization:enumeration algorithms
7/18
Enumeration algorithms
Idea:In real life, we do not want to computeall the answers we just need to be able toenumerateanswers quickly
→ Formalization:enumeration algorithms
7/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1: Preprocessing
Data structure
Phase 2: Enumeration
hα:42,β :57i, hα:1337,β:1351i
Results State
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2: Enumeration
hα:42,β :57i, hα:1337,β:1351i
Results State
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2: Enumeration
hα:42,β :57i, hα:1337,β:1351i
Results State
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2:
Enumeration
hα:42,β :57i, hα:1337,β:1351i
Results State
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2:
Enumeration
hα :42,β :57i,
hα:1337,β:1351i
Results
State
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2:
Enumeration
hα :42,β :57i,
hα:1337,β:1351i
Results State
0011
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2:
Enumeration
hα :42,β :57i,
hα:1337,β:1351i
Results State
0011
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2:
Enumeration
hα :42,β :57i, hα:1337,β:1351i
Results State
⊥
8/18
Formalizing enumeration algorithms
Antoine Amarilli Description Name Antoine Amarilli.Handle:a3nm.Identity Born 1990-02-07.French national.Appearance as of 2017.Auth OpenPGP. OpenId. Bitcoin.
Contact Email and XMPP a3nm@a3nm.net Affiliation Associate professor ...
TextT
␣+ α [a-z0-9.]∗ @
[a-z0-9.]∗ β ␣+ PatternP
Phase 1:
Preprocessing
Data structure
Phase 2:
Enumeration
hα :42,β :57i, hα:1337,β:1351i
Results State
⊥
Two waysto measure performance:
•
Total timefor phase 1•
Delay between two resultsin phase 2... incombinedanddata complexity 8/18