• Aucun résultat trouvé

Special-Character Precedence

Dans le document Teach Yourself Perl 5 in 21 days (Page 179-183)

Perl defines rules of precedence to determine the order in which special characters in patterns are interpreted. For example, the pattern

/x|y+/

matches either x or one or more occurrences of y, because + has higher precedence than | and is therefore interpreted first.

Table 7.3 lists the special characters that can appear in patterns in order of precedence (highest to lowest). Special characters with higher precedence are always interpreted before those of lower precedence.

Table 7.3. The precedence of pattern-matching special characters.

Special character Description

() Pattern memory

+ * ? {} Number of occurrences

^ $ \b \B Pattern anchors

| Alternatives

Because the pattern-memory special characters () have the highest precedence, you can use them to force other special characters to be evaluated first. For example, the pattern

(ab|cd)+

matches one or more occurrences of either ab or cd. This matches, for example, abcdab.

Remember that when you use parentheses to force the order of precedence, you also are storing into pattern memory. For example, in the sequence

/(ab|cd)+(.)(ef|gh)+\1/

the \1 refers to what ab|cd matched, not to what the . special character matched.

Now that you know all of the special-pattern characters and their precedence, look at a program that does more complex pattern matching. Listing 7.9 uses the various special-pattern characters, including the parentheses, to check whether a given input string is a valid twentieth-century date.

Listing 7.9. A date-validation program.

1: #!/usr/local/bin/perl 2:

3: print ("Enter a date in the format YYYY-MM-DD:\n");

4: $date = <STDIN>;

5: chop ($date);

6:

7: # Because this pattern is complicated, we split it 8: # into parts, assign the parts to scalar variables, 9: # then substitute them in later.

10:

11: # handle 31-day months

12: $md1 = "(0[13578]|1[02])\\2(0[1-9]|[12]\\d|3[01])";

13: # handle 30-day months

14: $md2 = "(0[469]|11)\\2(0[1-9]|[12]\\d|30)";

15: # handle February, without worrying about whether it's 16: # supposed to be a leap year or not

17: $md3 = "02\\2(0[1-9]|[12]\\d)";

18:

19: # check for a twentieth-century date

20: $match = $date =~ /^(19)?\d\d(.)($md1|$md2|$md3)$/;

21: # check for a valid but non-20th century date

22: $olddate = $date =~ /^(\d{1,4})(.)($md1|$md2|$md3)$/;

23: if ($match) {

24: print ("$date is a valid date\n");

25: } elsif ($olddate) {

26: print ("$date is not in the 20th century\n");

27: } else {

28: print ("$date is not a valid date\n");

29: }

$ program7_9

Enter a date in the format YYYY-MM-DD:

1991-04-31

1991-04-31 is not a valid date

$

Don't worry: this program is a lot less complicated than it looks! Basically, this program does the following:

It checks whether the date is in the format YYYY-MM-DD. (It allows YY-MM-DD, and also enables you to use a character other than a hyphen to separate the year, month, and date.)

1.

It checks whether the year is in the twentieth century or not.

2.

It checks whether the month is between 01 and 12.

3.

Finally, it checks whether the date field is a legal date for that month. Legal date fields are between 01 and either 29, 30, or 31, depending on the number of days in that month.

4.

If the date is legal, the program tells you so. If the date is not a twentieth-century date but is legal, the program informs you of this also.

Because the pattern to be matched is too long to fit on one line, this program breaks it into pieces and assigns the pieces to scalar variables. This is possible because scalar-variable substitution is supported in patterns.

Line 12 is the pattern to match for months with 31 days. Note that the escape sequences (such as \d) are preceded by another backslash (producing \\d). This is because the program actually wants to store a backslash in the scalar variable.

(Recall that backslashes in double-quoted strings are treated as escape sequences.) The pattern

(0[13578]|1[02])\2(0[1-9]|[12]\d|3[01])

which is assigned to $md1, consists of the following components:

The sequence (0[13578]|1[02]), which matches the month values 01, 03, 05, 07, 08, 10, and 12 (the 31-day months)

\2, which matches the character that separates the day, month, and year

The sequence (0[1-9]|[12]\d|3[01]), which matches any two-digit number between 01 and 31

Note that \2 matches the separator character because the separator character will eventually be the second pattern sequence stored in memory (when the pattern is finally assembled).

Line 14 is similar to line 12 and handles 30-day months. The only differences between this subpattern and the one in line 12 are as follows:

The month values accepted are 04, 06, 09, and 11.

The valid date fields are 01 through 30, not 01 through 31.

Line 17 is another similar pattern that checks whether the month is 02 (February) and the date field is between 01 and 29.

Line 20 does the actual pattern match that checks whether the date is a valid twentieth-century date. This pattern is divided into three parts.

^(19)?\d\d, which matches any two-digit number at the beginning of a line, or any four-digit number starting with 19

The separator character, which is the second item in parentheses-the second item stored in memory-and thus can be retrieved using \2

($md1|$md2|$md3)$, which matches any of the valid month-day combinations defined in lines 12, 14, and 17, provided it appears at the end of the line

The result of the pattern match, either true or false, is stored in the scalar variable $match.

Line 22 checks whether the date is a valid date in any century. The only difference between this pattern and the one in line 20 is that the year can be any one-to-four-digit number. The result of the pattern match is stored in $olddate.

Lines 23-29 check whether either $match or $olddate is true and print the appropriate message.

As you can see, the pattern-matching facility in Perl is quite powerful. This program is less than 30 lines long, including comments; the equivalent program in almost any other programming language would be substantially longer and much more difficult to write.

Dans le document Teach Yourself Perl 5 in 21 days (Page 179-183)