• Aucun résultat trouvé

Try It Out Matching Triple Numeric Digits

Dans le document Beginning Regular Expressions (Page 79-83)

Suppose that you want to match a sequence of three numeric digits. In plain English, you might say that you want to match a three-digit number. A slightly more formal way to express what you want to do is this: Match a numeric digit. If the first character is a numeric digit, attempt to match the next character as a numeric digit. If both the characters are numeric digits, attempt to match a third successive numeric digit.

The metacharacter \dmatches a single numeric digit; therefore, as described a little earlier, you could use the pattern:

\d\d\d

to match three successive numeric digits.

If all three matches are successful, a match for the regular expression pattern has been found.

The test file, ABC123.txt, is shown here:

ABC123

For the moment, let’s aim to match only the numeric digits using the pattern \d\d\dshown earlier.

For this example, we will use JavaScript, for reasons that will be explained in a moment.

1.

Navigate to the directory that contains the file ABC123.txtand ThreeDigits.html. Open ThreeDigits.htmlin a Web browser.

2.

Click the button labeled Click Here to Enter Text.

3.

When the prompt box opens, enter a string to test. Enter a string copied from ABC123.txt.

4.

Click the OK button and inspect the alert box to see if the string that you entered contained a match for the pattern \d\d\d.

Figure 3-7 shows the result after entering the string Part Number RRG417.

Figure 3-7

Try each of the strings from ABC123.txt. You can also create your own test string. Notice that the pat-tern \d\d\dwill match any sequence of three successive numeric digits, but single numeric digits or pairs of numeric digits are not matched.

How It Works

The regular expression engine looks for a numeric digit. If the first character that it tests is not a numeric digit, it moves one character through the test string and then tests whether that character matches a numeric digit. If not, it moves one character further and tests again.

If a match is found for the first occurrence of \d, the regular expression engine tests if the next character is also a numeric digit. If that matches, a third character is tested to determine if it matches the \d metacharacter for a numeric digit. If three successive characters are each a numeric digit, there is a match for the regular expression pattern \d\d\d.

You can see this matching process in action by using the Komodo Regular Expressions Toolkit. Open the Komodo Regular Expression Toolkit, and clear any existing regular expression and test string. Enter the test string A234BC; then, in the area for the regular expression pattern, enter the pattern \d. You will see that the first numeric digit, 2, is highlighted as a match. Add a second \d to the regular expression area, and you will see that 23is highlighted as a match. Finally, add a third \d to give a final regular expres-sion pattern \d\d\d, and you will see that 234is highlighted as a match. See Figure 3-8.

You can try this with other test text from ABC123.txt. I suggest that you also try this out with your own test text that includes numeric digits and see which test strings match. You may find that you need to add a space character after the test string for matching to work correctly in the Komodo Regular Expression Toolkit.

Why did we use JavaScript for the preceding example? Because we can’t use OpenOffice.org Writer to test matches for the \dmetacharacter.

51

Simple Regular Expressions

Figure 3-8

Matching numeric digits can pose difficulties. Figure 3-9 shows the result of an attempted match in ABC123.txtwhen using OpenOffice.org Writer with the pattern \d\d\d.

As you can see in Figure 3-9, no match is found in OpenOffice.org Writer. Numeric digits in OpenOffice.org Writer use nonstandard syntax in that OpenOffice.org Writer lacks support for the

\dmetacharacter.

One solution to this type of problem in OpenOffice.org Writer is to use character classes, which are described in detail in Chapter 5. For now, it is sufficient to note that the regular expression pattern:

[0-9][0-9][0-9]

gives the same results as the pattern \d\d\d, because the meaning of [0-9][0-9][0-9]is the same as

\d\d\d. The use of that character class to match three successive numeric digits in the file ABC123.txt is shown in Figure 3-10.

Figure 3-10

Another syntax in OpenOffice.org Writer, which uses POSIX metacharacters, is described in Chapter 12.

The findstrutility also lacks the \dmetacharacter, so if you want to use it to find matches, you must use the preceding character class shown in the command line, as follows:

findstr /N [0-9][0-9][0-9] ABC123.txt

53

Simple Regular Expressions

You will find matches on four lines, as shown in Figure 3-11. The preceding command line will work cor-rectly only if the ABC123.txtfile is in the current directory. If it is in a different directory, you will need to reflect that in the path for the file that you enter at the command line.

Figure 3-11

The next section will combine the techniques that you have seen so far to find a combination of literally expressed characters and a sequence of characters.

Dans le document Beginning Regular Expressions (Page 79-83)

Documents relatifs