Shell Scripting - Programming with Laziness

Programming with Laziness

7.2 Shell Scripting

The core interpreter is defined by cases:

interp :: WAE -> Env -> Value interp (Num n) env = n

interp (Add lhs rhs) env = interp lhs env + interp rhs env interp (Id i) env = lookup i env

interp (With bound_id named_expr bound_body) env = interp bound_body

(extend env bound_id (interp named_expr env)) The helper functions are equally straightforward:

lookup :: Identifier -> Env -> Value lookup var ((i,v):r)

| (eqString var i) = v

| otherwise = lookup var r

extend :: Env -> Identifier -> Value -> Env extend env i v = (i,v):env

This definition oflookupuses Haskell’s pattern-matching notation as an alternative to writing an explicit conditional. Finally, testing these yields the expected results:

CS173> interp (Add (Num 3) (Num 5)) []

CS173> interp (With "x" (Add (Num 3) (Num 5)) (Add (Id "x") (Id "x"))) []

We can comment out the type declaration forinterp(a line beginning with two dashes (--) is treated as a comment), reload the file, and ask Haskell for the type ofinterp:

interp :: WAE -> Env -> Int

Remarkably, Haskell infers the same type as the one we ascribed, differing only in the use of the type alias.

Exercise 7.1.5 Extend the Haskell interpreter to implement functions using Haskell functions to represent functions in the interpreted language. Does the resulting interpreted language have eager or lazy applica-tion? How would you make it take on the other semantics?

7.2 Shell Scripting

While most programmers have never programmed in Haskell before, many have programmed in a lazy language: the language of most Unix shells. In this text we’ll use the language of bash (the Bourne Again Shell), though most of these programs work identically or have very close counterparts in other popular shell languages.

The classical shell model assumes that all programs can potentially generate an infinite stream of output.

The simplest such example is the programyes, which generates an infinite stream ofy’s:

> yes y y y

and so on, forever. (Don’t try this at home without your fingers poised over the interrupt key!) To make it easier to browse the output of (potentially infinite) stream generators, Unix provides helpful applications such asmoreto page through the stream. In Haskell, function composition makes the output of one function (the equivalent of a stream-generating application) the input to another. In a shell, the|operator does the same. That is,

> yes | more

generates the same stream, but lets us view finite prefixes of its content in segments followed by a prompt.

Quitting frommoreterminatesyes.

What good isyes? Suppose you run the following command:

> rm -r Programs/Sources/Java

Say some of these files are write-protected. For each such file, the shell will generate the query rm: remove write-protected file ‘Programs/Sources/Java/frob.java’?

If you know for sure you want to delete all the files in the directory, you probably don’t want to manually typeyin response to each such question. How many can there be? Unfortunately, it’s impossible to predict how many write-protected files will be in the directory. This is exactly whereyescomes in:

> yes | rm -r Programs/Sources/Java

generates as manyyinputs as necessary, satisfying all the queries, thereby deleting all the files.

We’ve seen thatmoreis a useful way of examining part of a stream. Butmoreis not directly analogous to Haskell’stake. In fact, there is a Unix application that is: it’s calledhead.headprints the firstnentries in a stream, wherenis given as an argument (defaulting to 10):

> yes | head -5 y

y y y y

These examples already demonstrate the value of thinking of Unix programs as generators and con-sumers of potentially infinite streams, composed with|. Here are some more examples. The applicationwc counts the number of characters (-c), words (-w) and lines (-l) in its input stream. Thus, for instance,

> yes | head -5 | wc -l 5

7.2. SHELL SCRIPTING 71 (not surprisingly). We can similarly count the number of files with suffix.scmin a directory:

> ls *.scm | wc -l 2

We can compose these into longer chains. Say we have a file containing a list of grades, one on each line; say the grades (in any order in the file) are two10s, one15, one17, one21, three5s, one2, and ten3s. Suppose we want to determine which grades occur most frequently (and how often), in descending order.

The first thing we might do is sort the grades, usingsort. This arranges all the grades in order. While sorting is not strictly necessary to solve this problem, it does enable us to use a very useful Unix application calleduniq. This application eliminates adjacent lines that are identical. Furthermore, if supplied the-c (“count”) flag, it prepends each line in the output with a count of how many adjacent lines there were. Thus,

> sort grades | uniq -c

This almost accomplishes the task, except we don’t get the frequencies in order. We need to sort one more time. Simply sorting doesn’t do the right thing in two ways:

> sort grades | uniq -c | sort

We want sorting to be numeric, not textual, and we want the sorting done in reverse (decreasing) order.

Therefore:

There is something fundamentally beautiful—and very powerful!—about the structure of the Unix shell.

Virtually all Unix commands respect the stream convention, and so do even some programming languages built atop it: for instance, by default, Awk processes its input one-line-at-a-time, so the Awk program {print $1} prints the first field of each line, continuing until the input runs out of lines (if ever), at which point the output stream terminates. This great uniformity makes composing programs easy, thereby encouraging programmers to do it.

Alan Perlis recognized the wisdom of such a design in this epigram: “It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures” (the data structure here being the stream). The greatest shortcoming of the Unix shell is that is is so lacking in data-sub-structure, relying purely on strings, that every program has to repeatly parse, often doing so incorrectly. For example, if a directory holds a filename containing a newline, that newline will appear in the output of ls; a program likewcwill then count the two lines as two different files. Unix shell scripts are notoriously fragile in these regards. Perlis recognized this too: “The string is a stark data structure and everywhere it is passed there is much duplication of process.”

The heart of the problem is that the output of Unix shell commands have to do double duty: they must be readable by humans but also ready for processing by other programs. By choosing human readability as the default, the output is sub-optimal, even dangerous, for processing by programs: it’s as if the addition procedure in a normal programming language always returned strings because you mighteventuallywant to print an answer, instead of returning numbers (which are necessary to perform further arithmethic) and leaving conversion of numbers to strings to the appropriate input/output routine.⁶

In short, Unix shell languages are both a zenith and a nadir of programming language design. Please study their design very carefully, but also be sure to learn the right lessons from them!

6We might fantasize the following way of making shell scripts more robust: all Unix utilities are forced to support a-xmlout flag that forces them to generate output in a standardXMLlanguage that did no more than wrap tags around each record (usually, but not always, line) and each field, and a-xmlinflag that informs them to expect data in the same format. This would eliminate the ambiguity inherent in parsing text.

Chapter 8

Dans le document Programming Languages: (Page 85-89)