• Aucun résultat trouvé

Editing with scripts

Dans le document Minimal Perl (Page 140-144)

Perl as a (better) sed command

4.7 E DITING FILES

4.7.2 Editing with scripts

It’s tedious to remember and retype commands frequently—even if they’re one-liners—so soon you’ll see a scriptified version of a generic file-changing program.

But first, let’s look at some sample runs so you can appreciate the program’s user interface, which lets you specify the search string and its replacement with a conve-nient -old='old' and -new='new' syntax:

11The exception is, of course, GNU sed, which has appropriated several useful features from Perl in re-cent years.

12This rosy scenario assumes you remembered to delete the *.bak files after confirming that they were no longer needed and before the OCDE could spot any “pants” within them!

$ change_file -old='\bALE\b' -new='LONDON-STYLE ALE' items

$ change_file -old='\bHEMP\b' -new='TUFF FIBER' items

You can’t see the results, because they went back into the items file. Note the use of the \b metacharacters in the old strings to require word boundaries at the appropri-ate points in the input. This prevents undesirable results, such as changing “WHITER SHADEOFPALE” into “WHITERSHADEOFPLONDON-STYLE ALE”.

The change_file script is very simple:

#! /usr/bin/perl -s -i.bak -wpl

# Usage: change_file -old='old' -new='new' [f1 f2 ...]

s/$old/$new/g;

The s option on the shebang line requests the automatic switch processing that handles the command-line specifications of the old and new strings and loads the associated

$old and $new variables with their contents. The omission of the our declarations for those variables (as detailed in table 2.5) marks both switches as mandatory.

In part 2 you’ll see more elaborate scripts of this type, which provide the addi-tional benefits of allowing case insensitivity, paragraph mode, and in-place editing to be controlled through command line switches.

Next, we’ll examine a script that would make a handy addition to any program-mer’s toolkit.

The insert_contact_info script

Scripts written on the job that serve a useful purpose tend to become popular, which means somewhere down the line somebody will have an idea for a useful extension, or find a bug. Accordingly, to facilitate contact between users and authors, it’s considered a good practice for each script to provide its author’s contact information.

Willy has written a program that inserts this information into scripts that don’t already have it, so let’s watch as he demonstrates its usage:

$ cd ~/bin # go to personal bin directory

$ insert_contact_info -author='Willy Nilly, willy@acme.com' change_file

$ cat change_file # 2nd line just added by above command

#! /usr/bin/perl –s -i.bak -wpl

# Author: Willy Nilly, willy@acme.com

# Usage: change_file -old='old' -new='new' [f1 f2...]

s/$old/$new/g;

For added user friendliness, Willy has arranged for the script to generate a helpful

“Usage” message when it’s invoked without the required -author switch:

$ insert_contact_info some_script

Usage: insert_contact_info -author='Author info' f1 [f2...]

The script tests the $author variable for emptiness in a BEGIN block, rather than in the body of the program, so that improper invocation can be detected before input processing (via the implicit loop) begins:

#! /usr/bin/perl -s -i.bak -wpl

# Inserts contact info for script author after shebang line BEGIN {

$author or

warn "Usage: $0 -author='Author info' f1 [f2 ...]\n" and exit 255;

}

# Append contact-info line to shebang line

$. == 1 and

s|^#!.*/bin/.+$|$&\n# Author: $author|g;

Willy made the substitution conditional on the current line being the first and hav-ing a shebang sequence, because he doesn’t want to modify files that aren’t scripts. If that test yields a True result, a substitution operator is attempted on the line.

Because the pathname he’s searching for (/bin/) contains slashes, using the custom-ary slash also as the field-delimiter would require those interior slashes to be back-slashed. So, Willy wisely chose to avoid that complication by using the vertical bar as the delimiter instead.

The regex looks for the shebang sequence (#!) at the beginning of the line, fol-lowed by the longest sequence of anything (.*; see table 3.10) leading up to /bin/. Willy wrote it that way because on most systems, whitespace is optional after the “!” character, and all command interpreters reside in a bin directory. This regex will match a variety of paths—including the commonplace /bin/, /local/bin/, and /usr/local/bin/—as desired.

After matching /bin/ (and whatever’s before it), the regex grabs the longest sequence of something (.+; see table 3.10) leading up to the line’s end ($). The “+” quantifier is used here rather than the earlier “*” because there must be at least one additional character after /bin/ to represent the filename of the interpreter.

If the entire first line of the script has been successfully matched by the regex, it’s replaced by itself (through use of $&; see table 3.4) followed by a newline and then a comment incorporating the contents of the $author switch variable. The result is that the author’s information is inserted on a new line after the script’s she-bang line.

Apart from performing the substitution properly, it’s also important that all the lines of the original file are sent out to the new version, whether modified or not.

Willy handles this chore by using the p option to automate that process. He also uses the -i.bak option cluster to ensure that the original version is saved in a file having a .bak extension, as a precautionary measure.

We’ll look next at a way to make regexes more readable.

Adding commentary to a regex

The insert_contact_info script is a valuable tool, and it shows one way to make practical use of Perl’s editing capabilities. But I wouldn’t blame you for thinking that the regex we just scrutinized was a bit hard on the eyes! Fortunately, Perl programmers can alleviate this condition through judicious use of the x modifier (see table 4.3), which allows arbitrary whitespace and comments to be included in the search field to make the regex more understandable.

As a case in point, insert_contact_info2 rephrases the substitution operator of the original version, illustrating the benefits of embedding commentary within the regex field. Because the substitution operator is spread over several lines in this new version, the delimiters are shown in bold, to help you spot them:

# Rewrite shebang line to append contact info

$. == 1 and

# The expanded version of this substitution operator follows below:

# s|^#!.*/bin/.+$|$&\n# Author: $author|g;

s|

^ # start match at beginning of line \#! # shebang characters

.* # optionally followed by anything; including nothing /bin/ # followed by a component of the interpreter path .+ # followed by the rest of the interpreter path $ # up to the end of line

|$&\n\# Author: $author|gx; # replace by match, \n, author stuff

Note that the “#” in the “#!” shebang sequence needs to be backslashed to remove its x-modifier-endowed meaning as a comment character, as does the “#” symbol before the word “Author” in the replacement field.

It’s important to understand that the x modifier relaxes the syntax rules for the search field only of the substitution operator—the one where the regex resides. That means you must take care to avoid the mistake of inserting whitespace or comments in the replacement field in an effort to enhance its readability, because they’ll be taken as literal characters there.13

Before we leave the insert_contact_info script, we should consider whether sed could do its job. The answer is yes, but sed would need help from the Shell, and the result wouldn’t be as straightforward as the Perl solution. Why?

Because you’d have to work around sed’s lack of the following features: the “+” metacharacter, automatic switch processing, in-place editing, and the enhanced regex format.

As useful as the –i.bak option is, there’s a human foible that can undermine the integrity of its backup files. You’ll learn how to compensate for it next.

13An exception is discussed in section 4.9—when the e modifier is used, the replacement field contains Perl statements, whose readability can be enhanced through arbitrary use of whitespace.

Dans le document Minimal Perl (Page 140-144)