P ERL ’ S INVOCATION OPTIONS

Perl essentials

2.1 P ERL ’ S INVOCATION OPTIONS

An invocation option is a character (usually a letter) that’s preceded by a hyphen and presented as one of the initial arguments to a ^perl command. Its purpose is to enable special features for the execution of a program.

Table 2.1 lists the most important invocation options.

Although each of the invocation options shown in table 2.1 is described under its own heading in the sections that follow, it’s not necessary to memorize what each one does, because they’re commonly used in only a handful of combinations. These combina-tions, which we call option clusters, consist of a hyphen followed by one or more options (e.g., ^–wnl).

Toward the end of this chapter, you’ll learn now to select the appropriate options for your programs using a procedure for selecting standard option clusters that takes the guesswork out of this important task.

First, we’ll describe what the individual options do.

2 A few of these options were discussed in chapter 1’s comparisons of easy and hard ways to write ^cat -like commands. To enhance the reference value of this chapter, these options are also included here.

Table 2.1 Effects of Perl’s most essential invocation options Option Provides Explanation

-e 'code' Execution of code

Causes Perl to execute code as a program. Used to avoid the overhead of a script’s file with tiny programs.

-w Warnings Enables warning messages, which is generally advisable.

-n Reading but no

printing

Requests an implicit input-reading loop that stores records in $_.

-p Reading and

printing

Requests an implicit input-reading loop that stores records in $_

and automatically prints that variable after optional processing of its contents.

-l Line-end

processing

Automatically inserts an output record separator at the end of print’s output. When used with -n or -p, additionally does automatic chomping—removal of the input record separator from input records.

-0digits Setting of input record separator

Defines the character that marks the end of an input record, using octal digits. The special case of -00 enables paragraph mode, in which empty lines mark ends of input records; -0777 enables file mode, in which each file constitutes a single record.

2.1.1 One-line programming: -e

The purpose of Perl’s ^e invocation option is to identify the next argument as the pro-gram to be executed. This allows a simple propro-gram to be conveyed to ^perl as an interactively typed command rather than as a specially prepared file called a script.

As an example, here’s a one-line command that calculates and prints the result of dividing 42 by 3:

$ perl -wl -e 'print 42/3;' 14

The division of ⁴² by ³ is processed first, and then the ^print function receives ¹⁴ as its argument, which it writes to the output.

We’ll discuss the ^w option used in that command’s invocation next and the ^l option shortly thereafter.

2.1.2 Enabling warnings: -w

Wouldn’t it be great if you could have Larry and his colleagues discreetly critique your Perl programs for you? That would give you an opportunity to learn from the masters with every execution of every program. That’s effectively what happens when you use the ^w option to enable Perl’s extensive warning system. In fact, Perl’s warnings are gen-erally so insightful, helpful, and educational that most programmers use the ^w option all the time.

As a practical example, consider which of the following runs of this program pro-vides the more useful output:

$ perl -l -e 'print $HOME;' # Is Shell variable known to Perl?

(no output)

$ perl -wl -e 'print $HOME;' # Apparently not!

Name "main::HOME" used only once: possible typo at -e line 1.

Use of uninitialized value in print at -e line 1.

The messages indicate that Perl was unable to print the value of the variable ^$HOME (because it was neither inherited from the Shell nor set in the program). Because there’s usually one appearance of a variable when its value is assigned and another when the value is retrieved, it’s unusual for a variable name to appear only once in a program. As a convenience to the programmer, Perl detects this condition and warns that the variable’s name may have been mistyped (“possible typo”).³

You’d be wise to follow the example of professional Perl programmers. They use the ^w option routinely, so they hear about their coding problems in the privacy of their own cubicles—rather than having them flare up during high-pressure customer demos instead!

The option we’ll cover next is also extremely valuable.

3 You could say the variable’s name was grossly mistyped, because in Perl this Shell variable is accessed as a member of an associative array (a.k.a. a hash) using ^$ENV{HOME}, as detailed in chapter 9.

2.1.3 Processing input: -n

Many Unix utilities (^grep, ^sed, ^sort, and so on) are typically used as filter pro-grams—they read input and then write some variation on it to the output.

Here’s an example of the Unix ^sed command inserting spaces at the beginning of each input line using its substitution facility, which typically appears in the form s/search-string/replacement-string/g:⁴

$ cat seattleites Torbin Ulrich 98107 Yeshe Dolma 98117

$ sed 's/^/ /g' seattleites Torbin Ulrich 98107 Yeshe Dolma 98117

The ^{^} symbol in the search-string field represents the beginning of the line, causing the spaces in the replacement-string to be inserted there before the modified line is sent to the output.

Here’s the Perl counterpart to that ^sed command, which uses a ^sed-like substitu-tion operator (described in chapter 4). Notice the need for an explicit request to print the resulting line, which isn’t needed with ^sed:

$ perl -wnl -e 's/^/ /g; print;' seattleites Torbin Ulrich 98107

Yeshe Dolma 98117

This command works like the ^sed command does—by processing one line at a time, taken from files named as arguments or from ^STDIN, using an implicit loop (provided by the ⁿ option). (For a more detailed explanation, see section 10.2.4.)

This example also provides an opportunity to review an important component of Perl syntax. The semicolons at the ends of the ^sed-like substitution operator and the print function identify each of them as constituting a complete statement—and that’s important! If the semicolon preceding ^print were missing, for example, that word would be associated with the substitution operator rather than being recognized as an invocation of the ^print function, and a fatal syntax error would result.

Because ^sed-like processing is so commonly needed in Perl, there’s a provision for obtaining it more easily, as shown next.

2.1.4 Processing input with automatic printing: -p

You request input processing with automatic printing after (optional) processing by using the ^p option in place of ⁿ:

4 Are you wondering about the use of the “global” replacement modifier (/g)? Because it’s needed much more often than not, it’s used routinely in Minimal Perl and removed only in the rare cases where it spoils the results. It’s shown here for both the sed and ^perl commands for uniformity.

$ perl -wpl -e 's/^/ /g;' seattleites # "p" does the printing Torbin Ulrich 98107

Yeshe Dolma 98117

This coding style makes it easier to concentrate on the primary activity of the com-mand—the editing operation—and it’s no coincidence that it makes the command look more like the equivalent ^sed command shown earlier. That’s because Larry modeled the substitution operator on the syntax of ^sed (and ^vi) to make Perl easier for UNIX users to learn.

Like the Shell’s ^echo command, Perl’s ^print can automatically generate new-lines, as you’ll see next.

2.1.5 Processing line-endings: -l

Before discussing how automatic processing of record separators works in Perl, we first need to define some terms.

A record is a collection of characters that’s read or written as a unit, and a file is a collection of records. When you’re dealing with text files, each individual line is con-sidered to be a separate record by default. The particular character, or sequence of characters, that marks the end of the record being read is called the input record sepa-rator. On Unix systems, that’s the linefeed character by default; but for portability and convenience, Perl lets you refer to the OS-specific default input record separator (whatever it may be) as ^\n, which is called newline.

Perl normally retains the input record separator as part of each record that’s read, so it’s still there if that record is printed later. However, with certain kinds of programs, it’s a great convenience to have the separators automatically stripped off as input is read, and then to have them automatically replaced when output is written by ^print. This effect is enabled by adding the ^l option to ⁿ or ^p with ^perl’s invocation.

To see what difference that option makes, we’ll compare the outputs of the fol-lowing two commands, which print the number of each input line (but not the input lines themselves). The numbers are provided by the special variable “^$.” (covered in table 2.2), which automatically counts records as they’re processed.

First, here’s a command that omits the ^l option and features a special Shell prompt (^go$) for added clarity:

go$ perl -wn -e 'print $.;' three_line_file # file has three lines 123go$

The output lines are scrunched together, because the “^$.” variable doesn’t contain a newline—and nothing else in the program causes one to be issued after each ^print. Notice also that the Shell’s prompt for the next command isn’t where it should be—at the beginning of a fresh line. That’s about as unnerving as a minor earthquake to the average Unix user!

In contrast, when the ^l option is used, a newline is automatically added at the end of ^print’s output:

go$ perl -wnl -e 'print $.;' three_line_file 1

2 3 go$

For comparison, here’s how you’d achieve the same result without using ^l:

$ perl -wn -e 'print $. , "\n";' three_line_file ...

This approach requires an explicit request to print the newline, which you make by adding the "\n" argument to ^print. Doing that doesn’t require a substantial amount of extra typing in this tiny program, but the extra work would be consider-able in programs having many ^print statements. To avoid the effort that would be wasted in routinely typing "\n" arguments for ^print statements, Minimal Perl nor-mally uses the ^l option.

Of course, in some situations it’s desirable to omit the newline from the end of an output line, as we’ll discuss next.

2.1.6 Printing without newlines: printf

In most programs that read input, using the ^l option offers a significant benefit, and in the others, it usually doesn’t hurt. However, there is a situation where the process-ing provided by this option is undesirable. Specifically, any program that outputs a string using ^print that should not be terminated with a newline will be affected adversely, because the ^l option will dutifully ensure that it gets one.

This situation occurs most commonly in programs that prompt for input. Here’s an example, based on the (unshown) script called ^order, which writes its output using ^print but doesn’t use the ^l option:

$ order

How many robotic tooth flossers? [1-200]: 42 We'll ship 42 tomorrow.

Here’s another version of that script, which uses the ^l option:

$ order2 # using -l option

How many robotic tooth flossers? [1-200]:

We'll ship 42 tomorrow.

As computer users know, it’s conventional for input to be accepted at the end of a prompt—not on the line below it, as in the case of ^order2. This can be accom-plished by using the ^printf function rather than ^print, because ^printf is immune to the effects of the ^l option.⁵

5 The name ^printf refers to this function’s ability to do formatted printing when multiple arguments are provided (as opposed to the single-argument case we’re using for prompting).

Accordingly, the ^order2 script can be adjusted to suppress the prompt’s newline by adding one letter to its output statement, as shown in the second line that follows:

print "How many robotic flossers? [1-200]:"; # -l INcompatible printf "How many robotic flossers? [1-200]:"; # -l compatible

In summary, the ^l option is routinely used in Minimal Perl, and ^printf is used in place of ^print when an automatic newline isn’t desired at the output’s end (as in the case of prompts).

Tip on using printf for prompting

In its typical (non-prompting) usage, ^printf’s first argument contains ^% symbols that are interpreted in a special way. For this reason, if your prompt string contains any ^% symbols, you must double each one (^%%) to get them to print correctly.

2.1.7 Changing the input record separator: -0digits

When the ⁿ or ^p option is used, Perl reads one line at a time by default, using an OS -appropriate definition of the input record separator to find each line’s end. But it’s not always desirable to use an individual line as the definition of a record, so Perl (unlike most of its UNIX predecessors) has provisions for changing that behavior.

The most common alternate record definitions are for units of paragraphs and files, with a paragraph being defined as a chunk of text separated by one or more empty lines from the next chunk. The input record separator of choice is specified using special sequences of digits with the ^-0digits option, shown earlier in table 2.1.

Look how the behavior of the indenting program you saw earlier is altered as the record definition is changed from a line, to a paragraph, to a file (^F represents the space character):

$ perl –wpl -e 's/^/^FFFF/g;' memo # default is line mode

FFFFThis is the file "memo", which has these

FFFFlines spread out like so.

FFFFAnd then it continues to a

FFFFsecond paragraph.

$ perl -00 -wpl -e 's/^/^FFFF/g;' memo # paragraph mode

FFFFThis is the file "memo", which has these lines spread out like so.

FFFFAnd then it continues to a second paragraph.

$ perl -0777 -wpl -e 's/^/^FFFF/g;' memo # file mode

FFFFThis is the file "memo", which has these lines spread out like so.

And then it continues to a second paragraph.

In all cases, indentation is inserted at the beginning of each record (signified by ^{^} in the search-string field), although it may not look that way at first glance.

The first command uses the default definition of a line as a record, so every line is indented at its beginning. The second defines each paragraph as a record, so each of the two paragraphs is indented at its beginning. The last command defines the entire file as one record, so the single file is indented at its beginning.

Apart from these commonly used alternate record definitions, it’s also possible to change the input record separator to an arbitrary character string by assigning that string to the special variable “^$/”. That technique is documented in table 2.7 for ref-erence purposes and demonstrated in later chapters (e.g., listings 9.3 and 9.4).

Now we’ll turn our attention to Perl’s fastest and easiest-to-use mechanism for storing and retrieving data: the variable.

2.2 U

SING VARIABLES

In part 1 of this book, most of the variables you’ll see are the simplest kind: scalar vari-ables, which can store only one value. All Perl variables have identifying marks, and for scalars, it’s a leading dollar sign. That’s different from AWK variables, which lack identifying marks, and from Shell variables, whose names must be typed either with or without a leading dollar sign—depending on the context.

Consider a scalar variable called ^$ring. It could store the text “size 10”, or a JPEG image of a gold wedding ring, or even an MPEG movie clip of the climactic meltdown of the “ring to rule them all” from the movie The Return of the King.

As these examples illustrate, the scalar variable’s restriction about storing only one value doesn’t affect the amount or type of data it can store. But whatever is stored in a scalar variable is treated as one indivisible unit.

In contrast, list variables provide for the storage (and retrieval) of multiple indi-vidually identifiable values, allowing you to make requests like “give me the MPEG for the Lord of the Rings movie number N,” where N can be 1, 2, or 3, or even “one”,

“two”, or “three”. We’ll discuss this class of variables in chapter 9.

Both types of Perl variables come in two varieties: special variables that are pro-vided automatically by the language, such as “^$.”, and user-defined variables that are created by programmers as needed, such as ^$ring.

We’ll look first at the special variables.

2.2.1 Using special variables

For the convenience of the programmer, Perl provides a number of special variables that are built into the language. Table 2.2 describes those that are most useful in the widest variety of small programs. These variables are derived from predecessors in the AWK language, so their AWKish names are shown as well—for the benefit of readers who are AWKiologists.⁶

6 As part of the AWK-like environment provided in Perl, programmers are even allowed to use the orig-inal AWK names for the special variables. Type ^man^English for more information.

Next, we’ll turn our attention to Perl’s hardest-working yet most inconspicu-ous variable.

2.2.2 Using the data variable: $_

The ^$_ variable has the honor of being Perl’s default custodian of data. For this rea-son, the ⁿ and ^p options store each input record they read in ^$_, and that variable acts as the default data donor for argument-free ^print statements.

Consider this example of a ^cat-like program that shows each line of its input:

$ perl -wnl -e 'print;' marx_bros Groucho

Harpo Chico

The implicit input-reading loop provided by the ⁿ option reads one line at a time and stores it in the ^$_ variable before executing the supplied program. Because ^print lacks any explicit arguments, it turns to the default data source, ^$_, to get the data it needs. In this way, each input line is printed in turn, without the need for any explicit reference to the location of the data.

Here’s a more explicit way to write this program, which requires more typing to achieve the same effect:

$ perl -wnl -e 'print $_;' marx_bros # $_ can be omitted Groucho

...

The next section shows a case where there’s a good reason to refer explicitly to ^$_. 2.2.3 Using the record-number variable: $.

When you use the ⁿ or ^p option, the record-number variable, “^$.”, holds the number of the current input record (as opposed to that record’s contents, which reside in ^$_). This variable is used in a variety of ways, such as to number lines, to select lines by number for processing, and to report the total number of lines that were processed.

Table 2.2 The data and record-number variables Variable Name Nickname Usage notes

$_ Dollar

underscore

Data variable When the n or p option is used, $_ contains the most recently read input record. It’s also the default data source for many built-in functions, such as print. AWK: $0

$. Dollar dot Record number variable

When the n or p option is used, or the input operator, “$.” contains the number of the current input record. AWK: NR

Here’s a program that prints each line’s number, a colon, and a space, followed by the line’s contents:

$ perl -wnl -e 'printf "$.: "; print;' marx_bros 1: Groucho

...

Notice that double quotes are used to allow the variable substitution request for “^$.” to be recognized (as in the Shell) while simultaneously causing other enclosed charac-ters, such as “:”, to be stripped of any special meanings. (More details on quoting techniques are presented in section 2.4.1.)

The ^printf function is used to output the line-number prefix, to avoid shifting

Dans le document Minimal Perl (Page 50-60)