• Aucun résultat trouvé

Try It Out Match Zero to Two Occurrences

Dans le document Beginning Regular Expressions (Page 97-102)

1.

Open OpenOffice.org Writer, and open the sample file Parts.txt.

2.

Use Ctrl+F to open the Find and Replace dialog box.

3.

Check the Regular Expressions and Match Case check boxes.

4.

Enter the regular expression pattern ABC[0-9]{0,2} in the Search For text box; click the Find All button; and inspect the matches that are displayed in highlighted text, as shown in Figure 3-19.

Figure 3-19

Notice that on some lines, only parts of a part number are matched. If you are puzzled as to why that is, refer back to the problem definition. You are to match a specified sequence of characters. You haven’t specified that you want to match a part number, simply a sequence of characters.

How It Works

How does it work with the match for the part number ABC? When the regular expression engine is at the position immediately before the uppercase Aof the part number ABC, it attempts to match an uppercase A. That matches. Next, an attempt is made to match an uppercase B. That too matches. Next, an attempt is made to match an uppercase C. That too matches. At that stage, the first three characters in the regular expression pattern have been matched. Finally, an attempt is made to match the pattern [0-9]{0,2}, which means “Match a minimum of zero and a maximum of two numeric characters.” Zero numeric digits follow the uppercase Cin ABC. Because there are exactly zero numeric digits after the uppercase C of ABC, there is a match (zero numeric digits matches the criterion “a minimum of zero numeric digits”

specified by the minimum-occurrence specifier of the {0,2}quantifier). Because the final component of the pattern matches, the whole pattern matches.

What happens when matching is attempted on the line that contains the part number ABC8899? Why do the first five characters of the part number ABC8899match? When the regular expression engine is at the position immediately before the Aof ABC8899, it attempts to match the next character in the part number with an uppercase Aand finds is a match. Next, an attempt is made to match an uppercase B. That too matches. Then an attempt is made to match an uppercase C, which also matches. At that stage, the first three characters in the regular expression pattern have been matched. Finally, an attempt is made to match the pattern [0-9]{0,2}, which means “Match a minimum of zero and a maximum of two numeric characters.” Four numeric digits follow the uppercase C. Only two of those numeric digits are needed for a successful match. Because there are four numeric digits after the uppercase Cof ABC, there is a match (of two numeric digits, which meets the criterion “a maximum of two numeric digits”), but the final two numeric digits of ABC8899are not needed to form a match, so they are not highlighted.

Because all components of the pattern match, the whole pattern matches.

{n,m}

The minimum-occurrence specifier in the curly-brace syntax doesn’t have to be 0. It can be any number you like, provided it is not larger than the maximum-occurrence specifier.

Let’s look for one to three occurrences of a numeric digit. You can specify this in a problem definition as follows:

Match an uppercase A. If there is a match, attempt to match an uppercase B. If there is a match, attempt to match an uppercase C. If all three uppercase characters match, attempt to match a mini-mum of one and a maximini-mum of three numeric digits.

So if you wanted to match one to three occurrences of a numeric digit in Parts.txt, you would use the following pattern:

ABC[0-9]{1,3}

Figure 3-20 shows the matches in OpenOffice.org Writer. Notice that the part number ABC does not match, because it has zero numeric digits, and you are looking for matches that have one through three numeric digits. Notice, too, that only the first three numeric digits of ABC8899form part of the match.

The How It Works explanation in the preceding section for the {0,m} syntax should be sufficient to help you understand what is happening in this example.

69

Simple Regular Expressions

Figure 3-20

{n,}

Sometimes, you will want there to be an unlimited number of occurrences. You can specify an unlimited maximum number of occurrences by omitting the maximum-occurrence specifier inside the curly braces.

To specify at least two occurrences and an unlimited maximum, you could use the following problem definition:

Match an uppercase A. If there is a match, attempt to match an uppercase B. If there is a match, attempt to match an uppercase C. If all three uppercase characters match, attempt to match a mini-mum of two occurrences and an unlimited maximini-mum occurrences of three numeric digits.

You can express that using the following pattern:

ABC[0-9]{2,}

Figure 3-21 shows the appearance in OpenOffice.org Writer. Notice that now all four numeric digits in ABC8899form part of the match, because the maximum occurrences that can form part of a match are unlimited.

Figure 3-21

Exercises

These exercises allow you to test your understanding of the regular expression syntax covered in this chapter.

1.

Using DoubledR.txtas a sample file, try out regular expression patterns that match other dou-bled letters in the file. For example, there are doudou-bled lowercase s, m, and l. Use different syntax options to match exactly two occurrences of a character.

2.

Create a regular expression pattern that tests for part numbers that have two alphabetic charac-ters in sequence — uppercase Afollowed by uppercase Bfollowed by two numeric digits.

3.

Modify the file UpperL.htmlso that the regular expression pattern to be matched is the. Open the file in a browser, and test various pieces of text against the specified regular expression pattern.

71

Simple Regular Expressions

4

Metacharacter s and

Dans le document Beginning Regular Expressions (Page 97-102)

Documents relatifs