The -match operator - P ATTERN MATCHING AND TEXT MANIPULATION

Operators and expressions

4.4 P ATTERN MATCHING AND TEXT MANIPULATION

4.4.3 The -match operator

The PowerShell version 1 operators that work with regular expressions are -match and ^-replace. These operators are shown in table 4.7 along with a description and some examples. PowerShell v2 introduced an additional -split operator, which we’ll cover a bit later.

The -match operator is similar to the -like operator in that it matches a pattern and returns a result. Along with that result, though, it also sets the ^$matches variable.

This variable contains the portions of the string that are matched by individual parts of the regular expressions. The only way to clearly explain this is with an example:

PS (1) > "abc" -match "(a)(b)(c)"

True

Here, the string on the left side of the ^-match operator is matched against the pattern on the right side. In the pattern string, you can see three sets of parentheses. Figure 4.7 shows this expression in more detail.

You can see on the right side of the -match operator that each of the components in paren-theses is a “submatch.” We’ll get to why this is important in the next section.

Table 4.7 PowerShell regular expression -match and -replace operators

Operator Description Example Result

-match -cmatch -imatch

Do a pattern match using regular expressions.

"Hello" –match "[jkl]" $true

-notmatch -cnotmath -inotmatch

Do a regex pattern match; return true if the pattern doesn’t match.

"Hello" –notmatch "[jkl]" $false

-replace -creplace -ireplace

Do a regular expression substitu-tion on the string on the left side and return the modified string.

"Hello" –replace "ello","i" "Hi"

Delete the portion of the string matching the regular expression.

Figure 4.7 The anatomy of a regular expression match operation where the pattern contains submatches. Each of the bracketed elements of the pattern corresponds to a submatch pattern.

PATTERNMATCHINGANDTEXTMANIPULATION 135 Figure 4.7 shows the anatomy of a regular expression match operation where the pat-tern contains submatches. Each of the bracketed elements of the patpat-tern corresponds to a submatch pattern.

The result of this expression was true, which means that the match succeeded. It also means that $matches should be set, so let’s look at what it contains:

PS (2) > $matches

Key Value --- ---3 c 2 b 1 a 0 abc

$matches contains a hashtable where the keys of the hashtable are indexes that corre-spond to parts of the pattern that matched. The values are the substrings of the target string that matched. Note that even though you only specified three subpatterns, the hashtable contains four elements. This is because there’s always a default element that represents the entire string that matched. Here’s a more complex example that shows multiple nested matches:

PS (4) > "abcdef" -match "(a)(((b)(c))de)f"

True

PS (5) > $matches

Key Value --- ---5 c 4 b 3 bc 2 bcde 1 a 0 abcdef

Now you have the outermost match in index 0, which matches the whole string. Next you have a top-level match at the beginning of the pattern that matches “a” at index 1. At index 2, you have the complete string matched by the next top-level part, which is “bcde”. Index 3 is the first nested match in that top-level match, which is “bc”.

This match also has two nested matches: b at element 4 and c at element 5.

Matching using named captures

Calculating these indexes is fine if the pattern is simple. If it’s complex, as in the pre-vious example, it’s hard to figure out what goes where—and even if you do, when you look at what you’ve written a month later, you’ll have to figure it out all over again.

The .NET regular expression library provides a way to solve this problem by using named captures. You specify a named capture by placing the sequence ?<name>

immediately inside the parentheses that indicate the match group. This allows you to

reference the capture by name instead of by number, making complex expressions easier to deal with. Here’s what this looks like:

PS (10) > "abcdef" -match "(?<o1>a)(?<o2>((?<e3>b)(?<e4>c))de)f"

True

Now let’s look at a more realistic example.

Parsing command output using regular expressions

Existing utilities for Windows produce text output, so you have to parse the text to extract information. (As you may remember, avoiding this kind of parsing was one of the reasons PowerShell was created. But it still needs to interoperate with the rest of the world.) For example, the ^net.exe utility can return some information about your computer configuration. The second line of this output contains the name of the computer. Your task is to extract the name and domain for this computer from that string. One way to do this is to calculate the offsets and then extract substrings from the output. This is tedious and error prone (since the offsets might change). Here’s how to do it using the $matches variable. First let’s look at the form of this string:

PS (1) > (net config workstation)[1]

Full Computer name brucepay64.redmond.corp.microsoft.com

It begins with a well-known pattern, Full Computer name, so start by matching against that to make sure there are no errors. You’ll see that there’s a space before the name, and the name itself is separated by a period. You’re pretty safe in ignoring the intervening characters, so here’s the pattern you’ll use:

PS (2) > $p='^Full Computer.* (?<computer>[^.]+)\.(?<domain>[^.]+)'

Figure 4.8 shows this pattern in more detail.

^Full Computer.* (?<computer>[^.]+)\.(?<domain>[^.]+)'

characters Matches the literal . character

Figure 4.8 This is an example of a regular expression pattern that uses the named submatch capability. When this expres-sion is used with the -match operator, instead of using simple numeric indexes in the $matches variable for the substrings, the names will be used.

PATTERNMATCHINGANDTEXTMANIPULATION 137 You check the string at the beginning, and then allow any sequence of characters that ends with a space, followed by two fields that are terminated by a dot. Notice that you don’t say that the fields can contain any character. Instead, you say that they can contain anything but a period. This is because regular expressions are greedy—that is, they match the longest possible pattern, and because the period is any character, the match won’t stop at the period. Now let’s apply this pattern:

PS (3) > (net config workstation)[1] -match $p True

It matches, so you know that the output string was well formed. Now let’s look at what you captured from the string:

PS (4) > $matches.computer brucepay64

PS (5) > $matches.domain redmond

You see that you’ve extracted the computer name and domain as desired. This approach is significantly more robust than using exact indexing for the following rea-sons. First, you checked with a guard string instead of assuming that the string at index 1 was correct. In fact, you could have written a loop that went through all the strings and stopped when the match succeeded. In that case, it wouldn’t matter which line contained the information; you’d find it anyway. You also didn’t care about where in the line the data actually appeared, only that it followed a basic well-formed pat-tern. With a pattern-based approach, output format can vary significantly, and this pattern would still retrieve the correct data. By using techniques like this, you can write more change-tolerant scripts than you would otherwise do.

The -match operator lets you match text; now let’s look at how to go about making changes to text. This is what the ^-replace operator is for, so we’ll explore that next.

4.4.4 The -replace operator

Dans le document Bruce PayetteSECOND EDITION (Page 165-168)