Strings and Characters - Specialized Data Structures

Specialized Data Structures

4.3 Strings and Characters

COMMENTING CONVENTIONS

In Common Lisp code, anything following a semicolon is treated as a comment. Some Lisp programmers use multiple semicolons to indicate the level of the comment: four semicolons in a heading, three in a description of a function or macro, two to explain the line below, and one when a comment is on the same line as the code it applies to. Using this convention, Figure 4.1 might begin:

;;;; Utilities for operations on sorted vectors.

;;; Finds an element in a sorted vector.

(defun bin-search (obj vec) (let ((len (length vec)))

;; if a real vector, send it to finder

(and (not (zerop len)) ; returns nil if empty (finder obj vec 0 (- len 1)))))

For extensive comments, it may be preferable to use the # I . . . I # read-macro. Everything between a # I and I # is ignored by read.⁰

If we insert the following line at the beginning of f i n d e r , (format t "~A~°/8" (subseq vec s t a r t (+ end 1 ) ) )

then we can watch as the number of elements left to be searched is halved in each step:

> ( b i n - s e a r c h 3 #(0 1 2 3 4 5 6 7 8 9))

# ( 0 1 2 3 4 5 6 7 8 9 )

#(0 1 2 3)

#(3) 3

4.3 Strings and Characters

Strings are vectors of characters. We denote a constant string as a series of characters surrounded by double-quotes, and an individual character c as #\c.

Each character has an associated integer—usually, but not necessarily, the ASCH number. In most implementations, the function char-code returns

the number associated with a character, and code-char returns the character associated with a number.⁰

The functions char< (less than), char<= (less than or equal), char=

(equal), char>= (greater than or equal), char> (greater than), and char/=

(different) compare characters. They work like the numeric comparison operators described on page 146.

> ( s o r t "elbow" # ' c h a r < )

"below"

Because strings are vectors, both sequence functions and array functions work on them. You could use aref to retrieve elements, for example,

> (aref "abc" 1)

#\b

but with a string you can use the faster char:

> (char "abc" 1)

#\b

You can use s e t f with char (or aref) to replace elements:

> ( l e t ( ( s t r (copy-seq " M e r l i n " ) ) ) ( s e t f (char s t r 3) #\k)

s t r )

"Merkin"

If you want to compare two strings, you can use the general equal, but there is also a function s t r i n g - e q u a l that ignores case:

> (equal "fred" "fred") T

> (equal "fred" "Fred") NIL

> (string-equal "fred" "Fred") T

Common Lisp provides a large number of functions for comparing and ma-nipulating strings. They are listed in Appendix D, starting on page 364.

There are several ways of building strings. The most general is to use format. Calling format with n i l as the first argument makes it return as a string what it would have printed:

> (format nil "~A or ~A" "truth" "dare")

"truth or dare"

4.4 ^SEQUENCES 63

But if you just want to join several strings together, you can use concatenate, which takes a symbol indicating the type of the result, plus one or more sequences:

> (concatenate ' s t r i n g "not " "to worry")

"not t o worry"

4.4 Sequences

In Common Lisp the type sequence includes both lists and vectors (and therefore strings). Some of the functions that we have been using on lists are actually sequence functions, including remove, length, subseq, reverse, sort, every, and some. So the function that we wrote on page 46 would also work with other kinds of sequences:

> (mirror? "abba") T

We've already seen four functions for retrieving elements of sequences:

nth for lists, aref and svref for vectors, and char for strings. Common Lisp also provides a function e l t that works for sequences of any kind:

> ( e l t ' (a b c) 1) B

For sequences of specific types, the access functions we've already seen should be faster, so there is no point in using e l t except in code that is

^supposed to work for sequences generally.

Using e l t , we could write a version of mirror? that would be more efficient for vectors:

(defun mirror? (s)

(let ((len (length s))) (and (evenp len)

(do ((forward 0 (+ forward 1)) (back (- len 1) (- back 1))) ((or (> forward back)

(not (eql (elt s forward) (elt s back)))) (> forward back))))))

This version would work with lists too, but its implementation is better suited to vectors. The frequent calls to e l t would be expensive with lists, because

lists only allow sequential access. In vectors, which allow random access, it is as cheap to reach one element as any other.

Many sequence functions take one or more keyword arguments from the standard set listed in this table:

PARAMETER

a function to apply to each element the test function for comparison if true, work backwards position at which to start position, if any, at which to stop

DEFAULT illustrate the roles of the keyword arguments.

> (position #\a "fantasia") 1

> (position #\a "fantasia" :start 3 :end 5) 4

The second example asks for the position of the first a between the fourth and sixth characters. The : s t a r t argument is the position of the first element to be considered, and defaults to the first element of the sequence. The : end argument is the position of the first element, if any, not to be considered.

If we give the : from-end argument,

> (position #\a "fantasia" :from-end t) 7

we get the position of the a closest to the end. But the position is calculated in the usual way; it does not represent the distance from the end.

The : key argument is a function that is applied to each element of a sequence before it is considered. If we ask something like this,

> ( p o s i t i o n ' a ' ( ( c d) (a b ) ) :key # ' c a r ) 1

then what we are asking for is the position of the first element whose car is the symbol a.

The : t e s t argument is a function of two arguments, and defines what it takes for a successful match. It always defaults to eql. If you're trying to match a list, you might want to use equal instead:

4A ^SEQUENCES 65

> ( p o s i t i o n ' ( a b) ' ( ( a b) (c d ) ) ) NIL

> ( p o s i t i o n ' ( a b) ' ( ( a b) (c d)) : t e s t # ' e q u a l ) 0

The .-test argument can be any function of two arguments. For example, by giving <, we can ask for the position of the first element such that the first argument is less than it:

> ( p o s i t i o n 3 ' ( 1 0 7 5 ) : t e s t # ' < ) 2

Using subseq and p o s i t i o n , we can write functions that take sequences apart. For example, this function

(defun second-word (str)

(let ((pi (+ (position #\ str) 1)))

(subseq str pi (position #\ str :start pi)))) returns the second word in a string of words separated by spaces:

> (second-word "Form follows function.")

"follows"

To find an element satisfying a predicate of one argument, we use p o s i t i o n - i f . It takes a function and a sequence, and returns the posi-tion of the first element satisfying the funcposi-tion:

> (position-if #'oddp ' ( 2 3 4 5)) 1

It takes all the keyword arguments except : t e s t .

There are functions similar to member and member-if for sequences.

They are, respectively, f i n d (which takes all the keyword arguments) and f i n d - i f (which takes all except : t e s t ) :

> (find #\a "cat")

#\a

> ( f i n d - i f # ' c h a r a c t e r p "ham")

#\h

Unlike member and member-if, they return only the object they were looking for.

Often a call to f i n d - i f will be clearer if it is translated into a f i n d with a : key argument. For example, the expression

( f i n d - i f #'(lambda (x)

( e q l (car x) ' c o m p l e t e ) ) 1 s t )

would be better rendered as

(find 'complete 1st :key #'car)

The functions remove (page 22) and remove-if both work on sequences generally. They bear the same relation to one another as f i n d and f i n d - i f . A related function is r e m o v e - d u p l i c a t e s , which preserves only the last of each occurrence of any element of a sequence:

> ( r e m o v e - d u p l i c a t e s "abracadabra")

"cdbra"

This function takes all the keyword arguments listed in the preceding table.

The function reduce is for boiling down a sequence into a single value.

It takes at least two arguments, a function and a sequence. The function must be a function of two arguments. In the simplest case, it will be called initially with the first two elements, and thereafter with successive elements as the second argument, and the value it returned last time as the first. The value returned by the last call is returned as the value of the reduce. Which means that an expression like

(reduce # ' f n ' ( a b c d)) is equivalent to

(fn (fn (fn ' a ' b ) ' c ) >d)

We can use reduce to extend functions that only take two arguments. For ex-ample, to get the intersection of three or more lists, we could write something like

> (reduce # ' i n t e r s e c t i o n ' ( ( b r a d ' s ) ( b a d ) ( c a t ) ) ) (A)

Dans le document ANSI Common Lisp (Page 78-83)