Example: Random Text - mentioned that lexical variables are only valid within the context wher

Specialized Data Structures

Section 2.10 mentioned that lexical variables are only valid within the context where they are defined. Along with this restriction comes the promise

8.8 Example: Random Text

If you're going to write programs that operate on words, it's often a good idea to use symbols instead of strings, because symbols are conceptually atomic.

Symbols can be compared in one step with eql, while strings have to be compared character-by-character with s t r i n g - e q u a l or s t r i n g = . As an

8.8 EXAMPLE: RANDOM TEXT 139 example, this section shows how to write a program to generate random text.

The first part of the program will read a sample text (the larger the better), accumulating information about the likelihood of any given word following another. The second part will take random walks through the network of words built in the first, after each word making a weighted random choice among the words that followed it in the original sample.

The resulting text will always be locally plausible, because any two words that occur together will be two words that occurred together in the input text.

What's surprising is how often you can get entire sentences—sometimes entire paragraphs—that seem to make sense.

Figure 8.2 contains the first half of the program, the code for reading the sample text. The data derived from it will be stored in the hash table

*words*. The keys in this hash table will be symbols representing words, and the values will be assoc-lists like the following:

( ( I s i n l . 1) (|wide I . 2) (I s i g h t s I . 1))

This is the value associated with the key I d i s c o v e r I when Milton's Paradise Lost is used as the sample text. It indicates that "discover" was used four times in the poem, being twice followed by "wide" and once each by "sin"

and "sights".

The function r e a d - t e x t accumulates this information. It takes a path-name, and builds an assoc-list like the one shown above for each word encountered in the file. It works by reading the file one character at a time, accumulating words in the string buffer. With maxword = 100, the program will be able to read words of up to 100 letters, which is sufficient for English.

As long as the next character is a letter (as determined by a l p h a - char-p) or an apostrophe, we keep accumulating characters. Any other character ends the word, whereupon the corresponding symbol is sent to see. Several kinds of punctuation are also recognized as if they were words; the function punc returns the pseudo-word corresponding to a punctuation character.

The function see registers each word seen. It needs to know the previous word as well as the one just recognized—hence the variable prev. Initially this variable is set to the period pseudo-word; after see has been called, it will always contain the last word sent to the function.

After r e a d - t e x t returns, * words* will contain an entry for each word in the input file. By calling h a s h - t a b l e - c o u n t you can see how many distinct words there were. Few English texts have over 10,000.

Now comes the fun part. Figure 8.3 contains the code that generates text from the data accumulated by the code in Figure 8.2. The recursive function g e n e r a t e - t e x t drives the process. It takes a number indicating the number of words to be generated, and an optional previous word. Using the default will make the generated text start at the beginning of a sentence.

(defparameter *words* (make-hash-table :size 10000)) (defconstant maxword 100)

(defun read-text (pathname)

(with-open-file (s pathname :direction (let ((buffer (make-string maxword))

(pos 0))

(do ((c (read-char s nil :eof) (read-char s nil :eof))) ((eql c :eof))

(if (or (alpha-char-p c) (char= c (progn

(setf (aref buffer pos) c) (incf pos))

(progn

(unless (zerop pos)

•.input)

#V))

(see (intern (string-downcase

(subseq buffer 0 pos)))) (setf pos 0))

(let ((p (punc c))) (if p (see p))))))))) (defun punc (c)

(case c

(#\. M . I ) (#\, M , l ) (#\; M ; l ) (#\! M M ) (#\? M ? l ) ))

(let ((prev M M ) ) (defun see (symb)

(let ((pair (assoc symb (gethash prev (if (null pair)

*words*))))

(push (cons symb 1) (gethash prev *words*)) | (incf (cdr pair))))

(setf prev symb)))

Figure 8.2: Reading sample text.

SUMMARY 141 word. This function makes a random choice among the words that followed prev in the input text, weighted according to the frequency of each.⁰

At this point it would be time to give the program a test run. But in fact you have already seen an example of what it produces: the stanza at the beginning of this book, which was generated by using Milton's Paradise Lost as the input text.⁰

Summary

1. Any string can be the name of a symbol, but symbols created by read are transformed into uppercase by default.

2. Symbols have associated property lists, which behave like assoc-lists, though they don't have the same form.

3. Symbols are substantial objects, more like structures than mere names.

4. Packages map strings to symbols. To create an entry for a symbol in a package is to intern it. Symbols do not have to be interned.

5. Packages enforce modularity by restricting which names you can re-fer to. By default your programs will be in the user package, but larger programs are often divided into several packages defined for that purpose.

6. Symbols can be made accessible in other packages. Keywords are self-evaluating and accessible in any package.

7. When a program operates on words, it's convenient to represent the words as symbols.

Exercises

1. Is it possible for two symbols to have the same name but not be eql?

2. Estimate the difference between the amount of memory used to rep-resent the string "F00" and the amount used to reprep-resent the symbol foo.

3. The call to defpackage on page 137 used only strings as arguments.

We could have used symbols instead. Why might this have been dangerous?

4. Add the code necessary to make the code in Figure 7.1 be in a package named "RING", and that in Figure 7.2 be in a package named "FILE".

The existing code should remain unchanged.

5. Write a program that can verify whether or not a quote was produced by Henley (Section 8.8).

6. Write a version of Henley that can take a word and generate a sentence with that word in the middle of it.

9 Numbers

Number-crunching is one of Common Lisp's strengths. It has a rich set of numeric types, and its features for manipulating numbers compare favorably with any language.

9.1 Types

Common Lisp provides four distinct types of numbers: integers, floating-point numbers, ratios, and complex numbers. Most of the functions described in this chapter work on numbers of any type. A few, explicitly noted, accept all but complex numbers.

An integer is written as a string of digits: 2001. A floating-point number can be written as a string of digits containing a decimal point, 253.72, or in scientific notation, 2.5372e2. A ratio is written as a fraction of integers: 2 / 3 . And the complex number a+bi is written as #c (a b), where a and b are any two real numbers of the same type.

The predicates i n t e g e r p , f l o a t p , and complexp return true for num-bers of the corresponding types. Figure 9.1 shows the hierarchy of numeric types.

Here are some general rules of thumb for determining what kind of number a computation will return:

1. If a numeric function receives one or more floating-point numbers as arguments, the return value will be a floating-point number (or a complex number with floating-point components). So (+ 1.0 2) evaluates to 3.0, and (+ #c(0 1.0) 2) evaluates to # c ( 2 . 0 1.0).

143

number /

/ "

/ " A

\ float 4

\ complex

Figure 9.1:

^ ratio

<CT ^ bignum

^ integer < ^

^ ^ fixnum

y short-float y/^ single-float

\ ^ — double-float

\ long-float

Numeric types.

bit

2. Ratios that divide evenly will be converted into integers. So (/ 10 2) will return 5.

3. Complex numbers whose imaginary part would be zero will be con-verted into reals. So (+ # c ( l -1) #c(2 1)) evaluates to 3.

Rules 2 and 3 apply to arguments as soon as they are read, so:

> ( l i s t ( r a t i o p 2/2) (complexp # c ( l 0 ) ) ) (NIL NIL)

Dans le document ANSI Common Lisp (Page 155-161)