Some Other Useful Data Types - An Introduction to Scheme and its Implementation

[ Parts of this should probably be moved into the previous chapter, and new examples put in this section. ]

Scheme has several important kinds of data objects that are useful in programming in general, and particularly for writing an interpreter, as we'll do in the next chapter. These include character strings, symbols, and lists.

Scheme has two data types that represent sequences of characters, called strings and symbols. Strings are pretty much like character strings in most programming languages--they represent a sequence of text characters. Symbols are sort of like strings, but have a very special property--there's only one symbol object with any particular sequence of characters.

Symbols have a special role in the implementation of Scheme, because they're part of the normal representation of source code; symbols are used to represent names of variables, procedures, special forms, and macros. They're really just a kind of data object, though--you can use them in your programs, whether or not you want to represent code.

Lists are used in interpreters and compilers to represent compound expressions in the source code;

nested expressions are generally represented by nested lists.

More generally, there's a category of Scheme data structures called s-expressions, which consist of basic types including symbols, strings, numbers, booleans, and characters, and list of those simple types, or lists of such lists.

"S-expression" is short for "symbolic expression," but it's something of a misnomer. An expression is really a piece of a program. An "s-expression " is just a data structure, which may or may not represent an expression in a programming language, although interpreters and compilers often happen to use them that way.

● Strings: Character Strings

● Symbols: Symbols are like Strings, but Unique

● Identifiers: A Note on Identifiers

● Lists: Lists

Go to the first, previous, next, last section, table of contents.

http://www.federated.com/~jim/schintro-v14/schintro_101.html11/3/2006 9:06:55 PM

An Introduction to Scheme and its Implementation - Strings

Go to the first, previous, next, last section, table of contents.

Strings

Character strings in Scheme are written between double quotes. For example, suppose we want an object that represents the text "Hello world!" We can just write that in a program, in between double quotes:

"Hello, world!".

You can use a string as an expression--the value of a string is the string itself, just as the value of an integer is the integer itself. Like numeric literals and booleans, strings are "self-evaluating," which just means that if you have an expression in your program that consists of just a string, Scheme assumes you mean the value to be literally that string. There's nothing deep about this--it just turns out to be handy, because it makes it easy to use strings as literals.

Try typing the string "Hello, world." at the Scheme prompt.

Scheme>"Hello, world!"

"Hello, world!"

What happened here is that Scheme recognized the sequence of characters between double quotes as a string literal. The value of a literal string expression (in double quotes) is a (pointer to) a string object. A string object is a normal first-class object like a pair or a number, conceptually like an array that can only hold characters.

This value is what scheme printed out. The standard printed representation of a string object is the sequence of characters, with double quotes around it.

So what happened here is that Scheme read the sequence of characters in double quotes, constructed an array-like object of type string, then printed out the printed representation of that object.

If you want to print out a string, but without the double quotes, you can use the standard procedure display. If you pass display a string, it just prints out the characters in the string, without any double quotes.

display is useful in programs that print information out for normal users. Another useful procedure is newline, which prints a newline character, ending a line and starting a new one.

Try typing a (display "Hello, world!") (newline) at the Scheme prompt. What you get may look like this:

http://www.federated.com/~jim/schintro-v14/schintro_102.html (1 of 3)11/3/2006 9:07:01 PM

An Introduction to Scheme and its Implementation - Strings

Scheme>(display "Hello, world!") (newline) Hello, world!

#void

You might see something slightly different on your screen, depending on the return value of newline, which is unspecified in the Scheme standard.

If you type in an expression using a string literal like "foo" at the Scheme prompt, Scheme may construct a new string object with that character sequence each time.

Try this:

For each of the define forms, Scheme has constructed a string with the character sequence f-o-o, and saved it in a new variable binding. When we ask the value of each variable, Scheme prints out the usual text representation of the string. The printed representations are the same, since each string has the same structure, but they're two different objects--when we ask if they're eq?, i.e., the very same object, the answer is no (#f).

It's possible that in your system the eq? comparison will return #t, because Scheme implementations are allowed to use pointers to the same string if you type in two strings with the same character

sequence. For that reason, you should be careful not to depend on whether Scheme strings are eq?; you should only distinguish whether they're equal?. You can also use the predicate string-equal? if you know the arguments are supposed to be strings. This has the advantage of signaling an error if the arguments are of unexpected type.

Strings can be used as one-dimensional arrays (vectors) of characters. There are procedures for accessing their elements by an integer index, extracting substrings given two indices, and so on.

http://www.federated.com/~jim/schintro-v14/schintro_102.html (2 of 3)11/3/2006 9:07:01 PM

An Introduction to Scheme and its Implementation - Strings

Go to the first, previous, next, last section, table of contents.

http://www.federated.com/~jim/schintro-v14/schintro_102.html (3 of 3)11/3/2006 9:07:01 PM

An Introduction to Scheme and its Implementation - Symbols

Go to the first, previous, next, last section, table of contents.

Symbols

Symbols are like strings, in that they have a character sequence. Symbols are different, however, in that only one symbol object can have any given character sequence. The character sequence is called the symbol's print name. A print name is not the same thing as a variable name, however--it's just the character sequence that identifies a particular unique symbol. It's called the print name because that's what's printed out when you display the object (or write it).

Unlike strings, booleans, and numbers, symbols are not self-evaluating. To refer to a literal symbol, you have to quote it. Since print names of symbols look just like variable names, you have to tell Scheme which you mean.

If we type in the character sequence f o o without double quotes around it, Scheme assumes we mean to refer to a variable named foo, not the unique symbol whose print name is foo.

In interpreters and compilers, symbol objects are often used as variable names, and Scheme treats them specially. If we just type in a character string that's a symbol print name, and hit return, Scheme assumes that we are asking for the value of the binding of the variable with that name--if there is one.

Scheme>(define foo 10)

#void

Scheme>foo 10

If we quote the symbol name with the single quote character, Scheme interprets that as meaning we want the symbol object foo.

Scheme>'foo foo

Since we've already defined (and bound) the variables foo1 and foo2, we can ask Scheme to look up their values.

An Introduction to Scheme and its Implementation - Symbols

Here we've typed in the names that we gave to variables earlier, and Scheme looked up the values of the variables.

As we've seen before, this doesn't work if there isn't a bound variable by that name. Symbols can be used as variable names, if you define the variable, but by default a symbol is just an object with a particular print name that identifies it.

If we want to refer to the symbol object foo, rather than using foo as a variable name, we can quote it, using the special quote character '. This tells Scheme not to evaluate the following expression, but to treat it as literal data.

Scheme> 'foo foo

When you type 'foo, you're telling Scheme you want a pointer to the symbol whose print name is foo.

It doesn't matter whether there's a variable named foo or what its current value is---'foo means a pointer to the unique symbol object whose print name is foo, which has nothing to do with any variable foo.

The first time you type in a symbol name, Scheme constructs a symbol object with that character sequence, and puts it in a special table. If you later type in a symbol name with the same character sequence, Scheme notices that it's the same sequence. Instead of constructing a new object, as it would for a string, it just finds the old one in the table, and uses that--it gives you a pointer to the same object, instead of a pointer to a new one.

Here, when we typed in the first definition, Scheme created a symbol object with the character sequence b a r, and added it to its table of existing symbols, as well as putting a pointer to it in the new variable binding bar1. When we typed in the second definition, Scheme noticed that there was already a symbol object named bar, and put a pointer to that same object in bar2 as well.

When we asked Scheme if the values of bar1 and bar2 referred to the same object, the answer was yes (#t)---they both referred to the unique symbol bar; there is only one symbol by that name.

http://www.federated.com/~jim/schintro-v14/schintro_103.html (2 of 3)11/3/2006 9:07:07 PM

An Introduction to Scheme and its Implementation - Symbols

The big advantage of symbols over strings is that comparing them is very fast. If you want to know if two strings have the same character sequence, you can use equal?, which will compare their

characters until it either finds a mismatch or reaches the ends of both strings.

With symbols, you can use equal?, but you can get the same results using eq?, which is faster. Recall that eq? just compares the pointers to two objects, to see if they're actually the same object. For

symbols, this works to compare the print names, too, because two symbols can have the same name only if they're the same object. You don't have to worry about symbols being equal? but not eq?.

This makes symbols good for use as keys in data structures. For example, you can zip through a list looking for a symbol, using eq?, and all it has to do is compare pointers, not character sequences.

Another advantage of symbols is that only one copy of its character sequence is actually stored, and all occurrences of the same symbol are represented as pointers to the same object. Each additional

occurrence of symbol thus only costs storage for a pointer.

If you're doing text processing in Scheme, e.g., writing a word processor, you probably want to use strings, not symbols. Strings support more operations that make it convenient to concatenate them, modify them, etc.

Symbols are mainly used as key values in data structures, which happen to have a convenient human-readable printed representation.

If you need to convert between strings and symbols, you can use string->symbol and

symbol->string. string->symbol takes a string and returns the unique symbol with that print name, if there is one. (If there's not, and the string is a legal symbol print name, it creates one and returns it.) symbol->string takes a symbol and returns a string representing its print name. (There is no guarantee as to whether it always returns the same string object for a given symbol, or a copy with the same sequence of characters.)

Go to the first, previous, next, last section, table of contents.

http://www.federated.com/~jim/schintro-v14/schintro_103.html (3 of 3)11/3/2006 9:07:07 PM

An Introduction to Scheme and its Implementation - Identifiers

Go to the first, previous, next, last section, table of contents.

A Note on Identifiers

When you type in a string, e.g., "This here is a string, you know.", you can type in pretty much whatever you want, as long as it's between double quotes and doesn't have double quotes or nonprinting characters in the middle. (You can have strings with double quotes in them, but you have to use a special escape sequence trick.)

When you type in a symbol, on the other hand, you have to be a little more careful--some character sequences count as symbol names, but others don't. For example, the character sequence 1 2 3 doesn't count as a symbol 123, because it's a number. Character sequences with spaces, parentheses, and single quotes in them are also a no-no, because those characters have special meaning when reading and

writing the printed representations of Scheme data structures.

A symbol name has to start with an "extended alphabatic" character--that a letter or any of a fairly large set of printing characters, followed by a string of other extended alphabetic characters or digits. (The extended alphabetic characters are a-z, A-Z, and these: + - . * / < = > ! ? : $ % _ & ~

^.)

For example, the following are all symbols:

● x

● thursdays-total*3

● am_is_are_was_were_be_being_been

● able-was-I-ere-I-saw-elba

● floppy_drive-3.5

● fourscore-and-7-years-ago

● x-15+three-times-thirty-seven

● =1

● lhs=>rhs

● x+/-3%

There is a slight restriction that you can't use a symbol name that starts with a character that could begin a literal number. This includes not only digits, but +, -, . and #. A special exception to this is that +, and -, by themselves, are symbols, and so is ... (the ellipsis identifier used in macros).

Scheme identifiers (variable names and special form names and keywords) have almost the same restrictions as Scheme symbol object character sequences, and it's no coincidence. Most

implementations of Scheme happen to be written in Scheme, and symbol objects are used in the interpreter or compiler to represent variable names.

http://www.federated.com/~jim/schintro-v14/schintro_104.html (1 of 2)11/3/2006 9:07:14 PM

An Introduction to Scheme and its Implementation - Identifiers

Don't read too much into this, however: it's easy to write a Scheme interpreter or compiler in Scheme, and that is why the rules for symbol names are the same as the rules for variable names, but symbols and variables are very, very different things. A symbol is just a data object, like a string, that has the special property of being unique. You can use symbols like any other data object, as part of any data structure.

It just happens that interpreters and compilers generally use symbol objects to represent the names of variables and whatnot, so it's convenient that the rules for symbol object names are the same as the rules for identifiers in the language--but there is no other connection.

Symbols are not necessarily variable names, they're just a kind of data object (like strings) that happen to get used that way, by some programs (interpreters and compilers). Your programs can use them any way you choose. (Sorry to be repetitive on this point, but confusing symbols and variables is one of the most common and avoidable problems in learning Scheme. It's worse in Lisp, where symbols and variables do have a deep connection, but not an obvious one.)

Go to the first, previous, next, last section, table of contents.

http://www.federated.com/~jim/schintro-v14/schintro_104.html (2 of 2)11/3/2006 9:07:14 PM

An Introduction to Scheme and its Implementation - Lists Again

Go to the first, previous, next, last section, table of contents.

Dans le document An Introduction to Scheme and its Implementation (Page 191-200)