• Aucun résultat trouvé

Try It Out Matching Multiple Optional Characters

Dans le document Beginning Regular Expressions (Page 88-91)

Use the sample file Colors2.txtto explore this example:

These colors are bright.

Some colors feel warm. Other colours feel cold.

A color’s temperature can be important in creating reaction to an image.

These colours’ temperatures are important in this discussion.

Red is a vivid colour.

59

Simple Regular Expressions

To test the regular expression, follow these steps:

1.

Open OpenOffice.org Writer, and open the file Colors2.txt.

2.

Use the keyboard shortcut Ctrl+F to open the Find and Replace dialog box.

3.

Check the Regular Expressions check box and the Match Case check box.

4.

In the Search for text box, enter the regular expression pattern colou?r’?s?’?, and click the Find All button. If all has gone well, you should see the matches shown in Figure 3-16.

Figure 3-16

As you can see, all the sample forms of the word of interest have been matched.

How It Works

In this description, I will focus initially on matching of the forms of the word colour/color. How does the pattern colou?r’?s?’?match the word color? Assume that the regular expression engine is at the position immediately before the first letter of color. It first attempts to match lowercase c, because one lowercase cmust be matched. That matches. Attempts are then made to match a subsequent

lowercase o, l, and o. These all also match. Then an attempt is made to match an optional lowercase u. In other words, zero or one occurrences of the lowercase character uis needed. Because there are zero occur-rences of lowercase u, there is a match. Next, an attempt is made to match lowercase r. The lowercase rin colormatches. Then an attempt is made to match an optional apostrophe. Because there is no occurrence of an apostrophe, there is a match. Next, the regular expression engine attempts to match an optional low-ercase s— in other words, to match zero or one occurrence of lowercase s. Because there is no occurrence of lowercase s, again, there is a match. Finally, an attempt is made to match an optional apostrophe.

Because there is no occurrence of an apostrophe, another match is found. Because a match exists for all the components of the regular expression pattern, there is a match for the whole regular expression pattern colour?r’?s?’?.

Now, how does the pattern colou?r’?s?’?match the word colour? Assume that the regular expression engine is at the position immediately before the first letter of colour. It first attempts to match lowercase c, because one lowercase cmust be matched. That matches. Next, attempts are made to match a subsequent lowercase o, l, and another o. These also match. Then an attempt is made to match an optional lowercase u. In other words, zero or one occurrences of the lowercase character uare needed. Because there is one occurrence of lowercase u, there is a match. Next, an attempt is made to match lowercase r. The lowercase rin colourmatches. Next, the engine attempts to match an optional apostrophe. Because there is no occurrence of an apostrophe, there is a match. Next, the regular expression engine attempts to match an optional lowercase s— in other words, to match zero or one occurrences of lowercase s. Because there is no occurrence of lowercase s, a match exists. Finally, an attempt is made to match an optional apostrophe.

Because there is no occurrence of an apostrophe, there is a match. All the components of the regular expres-sion pattern have a match; therefore, the entire regular expresexpres-sion pattern colour?r’?s?’?matches.

Work through the other six word forms shown earlier, and you’ll find that each of the word forms does, in fact, match the regular expression pattern.

The pattern colou?r’?s?’?matches all eight of the word forms that were listed earlier, but will the pattern match the following sequence of characters?

colour’s’

Can you see that it does match? Can you see why it matches the pattern? If each of the three optional characters in the regular expression is present, the preceding sequence of characters matches. That rather odd sequence of characters likely won’t exist in your sample document, so the possibility of false matches (reduced specificity) won’t be an issue for you.

How can you avoid the problem caused by such odd sequences of characters as colour’s’? You want to be able to express is something like this:

Match a lowercase c. If a match is present, attempt to match a lowercase o. If that match is present, attempt to match a lowercase l. If there is a match, attempt to match a lowercase o. If a match exists, attempt to match an optional lowercase u. If there is a match, attempt to match a lowercase r. If there is a match, attempt to match an optional apostrophe. And if a match exists here, attempt to match an optional lowercase s. If the earlier optional apostrophe was not present, attempt to match an optional apostrophe.

With the techniques that you have seen so far, you aren’t able to express ideas such as “match something only if it is not preceded by something else.” That sort of approach might help achieve higher specificity at the expense of increased complexity. Techniques where matching depends on such issues are presented in Chapter 9.

61

Simple Regular Expressions

Dans le document Beginning Regular Expressions (Page 88-91)

Documents relatifs