Forms of Input (Signals Through Nonnumeric Information)

ROBERT M. HAYES

University of California (Los Angeles)

INTRODUCTION

4

Traditionally, information systems have been characterized in terms of their dynamic properties, their internal decision processes, their informa-tion structure. Here, however, I am concerned with a somewhat different aspect-the form of the source of the basic data. Weare all generally familiar with how diverse these sources can be-photographs, electro-encephalographs, radar signals, audio and video recordings, telemetry, printed characters, punched media. My aim is to present these various sources within the framework of an integrated picture, based on two char-acteristic aspects of input-the one of dimension and the other of formali-zation.

The content of this talk can thus be summarized rather quickly: funda-mentally ~ natural phenomena are multifaceted, both physically and intel-lectually. As a result, they are to some extent more complex than the processing equipment in an information system is capable of handling. To provide an acceptable input to the information system, some method must be used to reduce the natural complexity to the level of mechanical proc-esses. We do this in a physical sense by reducing the dimension of the source; and we do it in an intellectual sense by increasing the degree of formalization in the source.

Before discussing these two aspects in detail, however, I ought also to comment concerning some other factors which, to a large extent, I am ig-noring. Specifically, although the physical form of the input medium and the technology for recording on it are clearly most significant considera-tions in system design, they are not ones which really represent any intel-lectual problems. Thus, whether the input is from digital magnetic tape or punched cards may well determine how rapidly information can be processed or exactly what type of equipment will be used,l but it will not really affect what can be done with the information once it has been in-put, or what processing difficulties will be encountered in doing it.

Similarly, there are many technical problems related to the form of in-put which are involved in the actual handling of the information during

the input process itself-problems in buffering, in code conversion (IBM twelve-bit code to internal six-bit code, for example), in format conver-sion (parallel to serial, for example), in timing and control.²,3 Again, these are extremely significant in the actual design of the hardware system-and even, to an extent, of the programming⁴-but they also do not represent limitations on what can be done with the data once it has been input, or what processing difficulties will be encountered in doing it.

On the other hand, the two aspects I am concerned with today are fundamental in determining what can be done and how difficult it will be.

Reduction in complexity is achieved by eliminating information content and by breaking up relationships implicit in the original data, which cannot be encompassed in the simplified data. The one prevents the formation system from deriving results which depend upon the lost in-formation; the other forces the information system to reconstruct the lost relationships.

CHARACTERIZATION OF INPUT BY DIMENSION

This aspect of the form of input views information in terms of its dimensions-of value and of space. For example, a photograph provides one or more dimensions of value (one dimension with a gray scale, several with a full color scale including hue, intensity, and brightness) as func-tions on a two-dimensional space; an audio recording provides a single dimension of value as a function on a one-dimensional space, etc.

A digital computer can handle only zero-dimensional data-sets of single numbers-and can therefore represent more dimensions only by the sequencing of those numbers. Present-day analogue computers are able to accept a single dimension of value-at least, on a single channel-on one dimension of space, by substitution of time for it. Recently, several

"hybrid" machines have been developed which combine the continuous function processing of the analogue computer with the control and logical capabilities of the stored program digital computer.⁵^,6 This immensely extends digital computer capabilities, but still, more dimensions of space can be represented only by sequencing of the functions.

One can in principle visualize a type of processor capable of accepting information in two space-perhaps the photographic "dodger" is a primi-tive version of such a device.⁷ But lacking such a capability, for the pres-ent multidimensional phenomena such as photograph must be processed by an input which provides some mechanism for reducing the dimensions to zero, or one. The process for doing so is conceptually clear: the data must be sampled at intervals in one dimension and scanned through the

FORMS OF INPUT 23 Dther dimensions. The result is a representation of a function on two dimensions, for example, by a sequence of functions on one dimension, where each function in the sequence represents a slice through the original function. By a succession of such a sampling and scanning-in each of the original dimensions-the data is ultimately reduced to simply a suc-cession of numbers.

THE HARDWARE FOR SAMPLING AND INPUT

Obviously the simplest level of input, at least in the framework of our present discussion, is that which concerns the entry of discrete, essentially digitized data-alphabetic, numeric, binary. The variety of the corre-sponding input devices is almost too familiar,8 but for the sake of com-pleteness let me briefly review them: punched tape and corresponding tape punches and readers;9,10 punched cards and corresponding card punches and readers; ^IIdigital magnetic storage, with a few types of re-corders and many handlers and readers; 12,13 photographic binary record-ing and a few readers of it. 14-17 Summaries of the characteristics of most of the available commercial devices are listed in Tables 12, 13, and 14 of Becker and Hayes. 18

Since these devices virtually all require manual entry at some point,

much

effort has gone into the development of mechanical devices to con-vert es&entially digital information from non digital form (such as printed images or pcm magnetic recording) into digital form.19 But clearly, at this point, we are dealing with precisely the kind of multidimensional problem I have defined.

At the next level of complexity, the source is one-dimensional-in value, that is-and the input process requires conversion of analogue informa-tion into digital form. The variety of devices here, while perhaps not as familiar as the strictly digital equipment, is certainly not revolutionary.20-25 The precise form from which anyone of them takes is in large part a func-tion of the nature of the source material-electronic "ramps," pulse counters, digitizing disks,26 etc. In each case, the result can be considered as a "sampling" of the analogue signal at quantizing intervals. Tradition-ally, this has been viewed in terms of "round-off" error and its effects have at best been treated statistically.27

It is when we come to the next level of complexity, the continuous func-tion of a single variable-usually time-that the applicafunc-tions become most interesting. In fact, virtually all of modern communication theory and control system theory is oriented toward this type of situation.28-33 The equipment for sampling continuous signals is usually integrally associated

with the digitizing equipment mentioned above.³⁴^,35 However, in princi-ple, one can visualize hybrid (analog-digital) computers which would function on samples from an original continuous signal source. For ex-ample, a computer memory of analogue form-supplementing the digital data and program memory-could store samples of varying size, which might later be further sampled and digitized under program contro1.³⁶

The most general problem that seems within the present state of the art is that of handling images. For example, character reading equipment of the kind I have previously mentioned now exists, and several methods for analyzing the data resulting from them have been developed.³⁷^,38 Probably the most significant applications at this level of complexity are just now beginning to appear. ^39-55 The use of flying-spot scanners, previously ap-plied to dodging and other methods of image enhancement, offers a powerful tool for digitizing images. 56

The generalization of this concept of sampling to the case of three spatial dimensions is probably not a feasible concept as such. However, if we are content to accept some type of stereoscopic effect, there is existing electronic equipment which looks at two stereo photos with something like depth perception, follows terrain contour lines automatically, and traces out contour-line drawings. 57 The resulting electrical signals repre-sent the images at cuts through the three-dimensional surface. Since the data about the terrain is in electronic form, as output from a cathode ray tube, it could be fed directly into a computer and used for terrain analysis without manual intervention.

In summary, the variety of input forms extends from simple key-punched data to digitized samples of analogue signals, to samples of continuous functions, to scanning of photographs and other images-and perhaps eventually to even more dimensions.

THE MA THEMATICS OF SAMPLING

N ow there is nothing startling in this view of the forms of input. It is something which we all recognize intuitively and, in fact, have come al-most to accept for granted. On the other hand, the consequences of this view are by no means obvious. In the case of digitization, these conse-quences would presumably be derived from an adequate theory of round-off error. In the case of sampling of functions on one dimension, the development of a theory has had profound importance to information, communication, and control systems. The development of a comparable theory for image sampling will, I think, have similarly profound im-portance to our understanding of information processes. It therefore seems worthwhile to review the theory of the measurement of power

FORMS OF INPUT 25 spectra, particularly for the insight it may give to the problems which arise when we consider sampling of functions on more than one dimen-sion.

This theory is based upon the concept that, while information may be conveyed by a particular signal (or function of time), this is solely be-cause of the statistical properties characterizing it and the class of possible signals from which it comes. (Such an approach is, of course, consistent with the concepts of "communication theory," although it departs greatly from our intuitive concepts of information in its response-producing role.) The statistical properties we will review are not the only relevant ones, but they are usually the most useful ones. In particular, in almost every signal analysis problem, the autocovariance function, or its Fourier trans-form, the power spectrum, will be of prime importance.

Fundamentally, the power spectrum is based on the representation of the signal as a Fourier series; in this context it provides a picture of the relative contribution of each periodic component to the signal of interest (in fact, historically, power-spectrum analysis was called periodogram analysis).58 From our standpoint, the significance of spectrum analysis lies in the insight it provides into the effects of sampling. Specifically, those effects are twofold: First, sampling limits the frequency which can be re-covered to less than Y2~, where ~ is the sampling interval. 59 And second, not only is it impossible to determine the contribution due to higher fre-quencies; in addition, the effects of these higher frequencies, through

"aliasing" or "folding," alter the values of those frequencies which are within the limits. The significance of these effects has been well sum-marized by Blackman and Tukey: ^60-61

We may logically and usefully separate the analysis of an equally spaced record into four stages-each stage characterized by a question:

(a) Can the available data provide a meaningful estimated spectrum?

(b) Can the desires for resolution and precision be harmonized with what the data can furnish?

(d) How should modifications and routine processing be carried out?

The answer to the first question depends upon the spectrum of the source data; the response of the measuring (or sampling) instruments; the nature of the errors; and, as we have mentioned, the sampling interval.

In particular, they will determine whether the effect of aliasing or of noise is so great as to make the data almost wholly useless.

The answer to the second question depends upon the resolution and accuracy desired, compared with the amount of data available and the

number of separate pieces into which it falls. The answer to the third question depends upon the range of frequencies over which the spectrum is desired and estimates of the probable distribution of them, particularly with respect to the effects of folding. The answer to the fourth question involves the details of the technical processes of analyzing data of this kind and can be found in the Blackman and Tukey reference.⁶²

It would be nice if the theory for sampling of functions on one dimen-sion could be easily extended to two or more dimendimen-sions. For example, in traditional communication theory, the source is normally taken as a sequence of signals. This may be an appropriate view for an audio re-cording, for example, but not for a photograph.⁶³,64 To extend this tradi-tional theory requires definition of basic functions comparable to the trigonometric, say, on two-dimensional regions, followed by the two-di-mensional integral transforms comparable to the Fourier transform.⁶⁵ Unfortunately, two factors serve to complicate the situation: First, tions of the two variables are just inherently more complicated than func-tions of one variable, both as individual funcfunc-tions and more significantly as limits of sequences of functions.⁶⁶,67 And second, while the process of sampling a function on one dimension does not necessarily alter existing relations among values, the same process applied to a function on two dimensions must do so. The first factor can certainly be handled by ap-propriate extension of information theory and Fourier analysis to func-tions of several variables, but the second factor is fundamentally different.

In a very real sense, it is the second factor with which we will be con-cerned in discussing formalization, since it is formalization which provides the mechanism by which to define and easily to reconstruct relations exist-ing in the original data. If we are to handle Gestalt with a digital com-puter, it must be through the formalization of the relationships implied by it.

Dans le document ee ion (Page 31-36)