Example: String Substitution - mentioned that lexical variables are only valid within the cont

Specialized Data Structures

Section 2.10 mentioned that lexical variables are only valid within the context where they are defined. Along with this restriction comes the promise

7.4 Example: String Substitution

Here is a rare example with all five arguments:

> (format n i l " " 1 0 , 2 , 0 , ' * , ' F" 26.21875) 2 6 . 2 2 "

This is the original number rounded to 2 decimal places, (with the decimal point shifted left 0 places), right-justified in a field of 10 characters, padded on the left by blanks. Notice that a character given as an argument is written as ' *, not the usual #\*. Since the number fit in 10 characters, the fourth argument didn't have to be used.

All these arguments are optional. To use the default you can simply omit the corresponding argument. If all we want to do is print a number rounded to two decimal places, we can say:

> (format n i l " ~ , 2 , , , F " 26.21875)

"26.22"

You can also omit a series of trailing commas, so the more usual way to write the preceding directive would be:

> (format n i l "~,2F" 26.21875)

"26.22"

Warning: When format rounds, it does not guarantee to round up or to round down. That is, (format n i l "~,1F" 1.25) could yield either

" 1.2" or " 1.3". So if you are using format to display information that the user expects to see rounded in one particular way (e.g. dollar amounts), you should round the number explicitly before printing it.

7.4 Example: String Substitution

As an example of I/O, this section shows how to write a simple program to do string substitution in text files. We're going to write a function that can replace each instance of a string old in a file with some other string new.

The simplest way to do this is to look at each character in the input file and compare it to the first character of old. If they don't match, we can just print the input character straight to the output file. If they do match, we compare the next input character against the second character of old, and so on. If the characters are the same all the way to the end of old, we have a successful match, and we print new to the output file.⁰

What happens, though, if we get part of the way through old and the match fails? For example, suppose we are looking for the pattern "abac", and the input file contains "ababac". The input will seem to match the pattern until we get to the fourth character, which is c in the pattern and b in the input. At

this point we can write the initial a to the output file, because we know that no match begins there. But some of the characters that we have read from input file we still need: for example, the third character, a, does begin a successful match. So before we can implement this algorithm, we need a place to store characters that we've read from the input file but might still need.

A queue for storing input temporarily is called a buffer. In this case, because we know we'll never need to store more than a predetermined number of characters, we can use a data structure called a ring buffer. A ring buffer is a vector underneath. What makes it a ring is the way it's used: we store incoming values in successive elements, and when we get to the end of the vector, we start over at the beginning. If we never need to store more than n values, and we have a vector of length n or greater, then we never have to overwrite a live value.

The code in Figure 7.1 implements operations on ring buffers. The buf structure has five fields: a vector that will contain the objects stored in the buffer, and four other fields that will contain indices into the vector. Two of these indices, s t a r t and end, we would need for any use of ring buffers:

s t a r t points to the first value in the buffer, and will be incremented when we pop a value; end points to the last value in the buffer, and is incremented when we insert a new one.

The other two indices, used and new, are something we need to add to the basic ring buffer for this application. They will range between s t a r t and end. In fact, it will always be true that

s t a r t < used < new < end

You can think of used and new as being like s t a r t and end for the current match. When we start a match, used will be equal to s t a r t and new will be equal to end. We will increment used as we match successive characters from the buffer. When used reaches new, we have read all the characters that were in the buffer at the time the match started. We don't want to use more than the characters that were in the buffer when the match started, or we would end up using the same characters multiple times. Hence the distinct new index, which starts out equal to end, but is not incremented as new characters are inserted into the buffer during a match.

The function bref takes a buffer and an index, and returns the element stored at that index. By using the index mod the length of the vector, we can pretend that we have an arbitrarily long buffer. Calling (new-buf ri) yields a new buffer able to hold up to n objects.

To insert new values into a buffer, we will use b u f - i n s e r t . It simply increments the end and puts the new value at that location. The converse is buf-pop, which returns the first value in a buffer, then increments its s t a r t . These two functions would come with any ring buffer.

7.4 EXAMPLE: STRING SUBSTITUTION 127

(defstruct buf

vec (start -1) (used -1) (new -1) (end -1)) (defun bref (buf n)

(svref (buf-vec buf)

(mod n (length (buf-vec buf))))) (defun (setf bref) (val buf n)

(setf (svref (buf-vec buf)

(mod n (length (buf-vec buf)))) val))

(defun new-buf (len)

(make-buf :vec (make-array len))) (defun buf-insert (x b)

(setf (bref b (incf (buf-end b))) x)) (defun buf-pop (b)

(progl

(bref b (incf (buf-start b))) (setf (buf-used b) (buf-start b)

(buf-new b) (buf-end b)))) (defun buf-next (b)

(when (< (buf-used b) (buf-new b)) (bref b (incf (buf-used b))))) (defun buf-reset (b)

(setf (buf-used b) (buf-start b) (buf-new b) (buf-end b))) (defun buf-clear (b)

(setf (buf-start b) -1 (buf-used b) -1 (buf-new b) -1 (buf-end b) -1)) (defun buf-flush (b str)

(do ((i (1+ (buf-used b)) (1+ i))) ((> i (buf-end b)))

(princ (bref b i) str)))

Figure 7.1: Operations on ring buffers.

The next two functions are ones that we need specifically for this ap-plication: buf-next reads a value from a buffer without popping it, and b u f - r e s e t resets the used and new indices to their initial values, s t a r t and end. If we have already read all the values up to new, buf-next returns n i l . It won't be a problem distinguishing this from a real value because we're only going to store characters in the buffer.

Finally, b u f - f l u s h flushes a buffer by writing all the live elements to a stream given as the second argument, and b u f - c l e a r empties a buffer by resetting all the indices to - 1 .

The functions defined in Figure 7.1 are used in Figure 7.2, which contains the code for string substitution. The function f i l e - s u b s t takes four argu-ments; a string to look for, a string to replace it, an input file, and an output file.

It creates streams representing each of the files, then calls stream-subst to do the real work.

The second function, s t r e a m - s u b s t , uses the algorithm sketched at the beginning of this section. It reads from the input stream one character at a time. Until the input character matches the first element of the sought-for string, it is written immediately to the output stream (1). When a match begins, the characters involved are queued in the buffer buf (2).

The variable pos points to the position of the character we are trying to match in the sought-for string. When and if pos is equal to the length of this string, we have a complete match, and we write the replacement string to the output stream, also clearing the buffer (3). If the match fails before this point, we can pop the first character in the buffer and write it to the output stream, after which we reset the buffer and start over with pos equal to zero (4).

The following table shows what happens when we substitute " b a r i c "

for "baro" in a file containing just the word barbarous:

CHAR

7.4 EXAMPLE: STRING SUBSTITUTION 129

1 (defun file-subst (old new filel file2)

1 (with-open-file (in filel .-direction .-input) (with-open-file (out file2 :direction

:if-exists (stream-subst old new in out)))) (defun stream-subst (old new in out)

(let* ((pos 0)

(or (setf from-buf (buf-next buf)) | (read-char in nil :eof))))

The first column is the current character—the value of c; the second shows whether it was read from the buffer or directly from the input stream; the third shows the character it has to match—the posth element of old; the fourth shows which case is evaluated as a result; the fifth shows what is thereby written to the output stream; and the last column shows the contents of the buffer afterwards. In the last column, the positions of used and new are shown by a period after the character they point to; when both point to the same position, it is indicated by a colon.

If the file " t e s t l " contained the following text

The struggle between Liberty and Authority is the most conspicuous feature in the portions of history with which we are earliest familiar, particularly in that of Greece, Rome, and England.

then after evaluating (file-subst " th" " z" "testl" "test2^M),the file "test2" would read:

The struggle between Liberty and Authority is ze most conspicuous feature in ze portions of history with which we are earliest familiar, particularly in zat of Greece, Rome, and England.

To keep this example as simple as possible, the code shown in Figure 7.2 just replaces one string with another. It would be easy to generalize it to search for a pattern instead of a literal string. All you would have to do is replace the call to char= with a call to whatever more general matching function you wanted to write.

Dans le document ANSI Common Lisp (Page 142-147)