Example: Parsing Dates - Specialized Data Structures

Specialized Data Structures

4.5 Example: Parsing Dates

As an example of operations on sequences, this section shows how\to write a program to parse dates. We will write a program that can take a string \ like "16 Aug 1980 " and return a list of integers representing the day, month, and year.

4.5 EXAMPLE: PARSING DATES 67

Figure 4.2 contains some general-purpose parsing functions that we'll need in this application. The first, tokens, is for extracting the tokens from a string. Given a string and a test function, it returns a list of the substrings whose characters satisfy the function. For example, if the test function is a l p h a - c h a r - p , which returns true of alphabetic characters, we get:

> (tokens^Mabl2 3cde.f" tt'alpha-char-p 0) ("abⁿ "cde" "f")

All characters that do not satisfy the function are treated as whitespace—they separate tokens but are never part of them.

The function c o n s t i t u e n t is defined for use as an argument to tokens.

In Common Lisp, graphic characters are all the characters we can see, plus the space character. So if we use c o n s t i t u e n t as the test function,

> (tokens "abl2 3cde.f

gh" #^;c o n s t i t u e n t 0) (^Mabl2" "3cde.f" "gh")

then tokens will have the conventional notion of whitespace.

Figure 4.3 contains functions specifically for parsing dates. The function p a r s e - d a t e takes a date in the specified form and returns a list of integers representing its components:

(defun parse-date (str)

(let ((toks (tokens str #'constituent 0))) (list (parse-integer (first toks))

(parse-month (second toks)) (parse-integer (third toks))))) (defconstant month-names

#("jan" "feb" "mar" "apr" "may" "jun"

"Jul" "aug" "sep" "oct" "nov" "dec")) (defun parse-month (str)

(let ((p (position str month-names

:test #'string-equal))) (if p

(+ p 1) nil)))

Figure 4.3: Functions for parsing dates.

> ( p a r s e - d a t e "16 Aug 1980") (16 8 1980)

It uses tokens to break up a date string, and then calls parse-month and p a r s e - i n t e g e r to interpret the elements. To find the month, it calls parse-month, which is not case-sensitive because it uses s t r i n g - e q u a l to match the name of the month. To find the day and year, it calls the built-in p a r s e - i n t e g e r , which takes a string and returns the corresponding integer.

If we had to write code to parse integers, we might say something like:

(defun r e a d - i n t e g e r ( s t r )

( i f (every # ' d i g i t - c h a r - p s t r ) ( l e t ((accum 0))

(dotimes (pos ( l e n g t h s t r ) ) ( s e t f accum (+ (* accum 10)

( d i g i t - c h a r - p (char s t r p o s ) ) ) ) ) accum)

n i l ) )

This definition illustrates how to get from a character to a number in Common Lisp—the function d i g i t - c h a r - p not only tests whether a character is a digit, but returns the corresponding integer.

4.6 STRUCTURES 69

4.6 Structures

A structure can be considered as a deluxe kind of vector. Suppose you had to write a program that kept track of a number of rectangular solids. You might consider representing them as vectors of three elements: height, width, and depth. Your program would be easier to read if, instead of using raw svref s, you defined functions like

(defun b l o c k - h e i g h t (b) (svref b 0))

and so on. You can think of a structure as a vector in which all these kinds of functions get defined for you.

To define a structure, we use def s t r u c t . In the simplest case we just give the name of the structure and the names of the fields:

(defstruct point x

This defines a p o i n t to be a structure with two fields, x and y. It also implicitly defines the functions make-point, p o i n t - p , copy-point, p o i n t - x , and p o i n t - y .

Section 2.3 mentioned that Lisp programs could write Lisp programs.

This is one of the most conspicuous examples we have seen so far. When you call def s t r u c t , it automatically writes code defining several other functions.

With macros you will be able to do the same thing yourself. (You could even write d e f s t r u c t if you had to.)

Each call to make-point will return a new p o i n t . We can specify the values of individual fields by giving the corresponding keyword arguments:

> ( s e t f p (make-point :x 0 :y 0))

#S(POINT X 0 Y 0)

The access functions for p o i n t fields are defined not only to retrieve values, but to work with s e t f .

> ( p o i n t - x p) 0

> ( s e t f ( p o i n t - y p) 2) 2

> P

#S(POINT X 0 Y 2)

Defining a structure also defines a type of that name. Each point will be of type p o i n t , then s t r u c t u r e , then atom, then t . So as well as using p o i n t - p to test whether something is a point,

> ( p o i n t - p p) T

> (typep p ' p o i n t ) T

we can also use general-purpose functions like typep.

We can specify default values for structure fields by enclosing the field name and a default expression in a list in the original definition.

(defstruct polemic (type (progn

(format t "What kind of polemic was it? ") (read)))

(effect nil))

If a call to make-polemic specifies no initial values for these fields, they will be set to the values of the corresponding expressions:

> (make-polemic)

What kind of polemic was it? scathing

#S(POLEMIC TYPE SCATHING EFFECT NIL)

We can also control things like the way a structure is displayed, and the prefix used in the names of the access functions it creates. Here is a more elaborate definition for p o i n t that does both:

( d e f s t r u c t ( p o i n t (:cone-name p)

( . • p r i n t - f u n c t i o n p r i n t - p o i n t ) ) (x 0)

(y o))

(defun print-point (p stream depth)

(format stream "#<~A,~A>^M (px p) (py p)))

The : cone-name argument specifies what should be concatenated to the front of the field names to make access functions for them. By default it was p o i n t - ; now it will be simply p. Not using the default makes your code a little less readable, so you would only want to do this kind of thing if you're going to be using the access functions constantly.

The : p r i n t - f unct ion is the name of the function that should be used to print a point when it has to be displayed—e.g. by the toplevel. This function must take three arguments: the structure to be printed, the place where it is to be printed, and a third argument that can usually be ignored.² We will

2 In ANSI Common Lisp, you can give instead a : p r i n t - o b j e c t argument, which only takes the first two arguments. There is also a macro p r i n t - u n r e a d a b l e - o b j e c t , which should be used, when available, to display objects in # < . . . > syntax.

Dans le document ANSI Common Lisp (Page 83-88)