• Aucun résultat trouvé

On the need for computer modeling: The case of language processing

N/A
N/A
Protected

Academic year: 2022

Partager "On the need for computer modeling: The case of language processing"

Copied!
32
0
0

Texte intégral

(1)

Article

Reference

On the need for computer modeling: The case of language processing

CONTENT, Alain, FRAUENFELDER, Ulrich Hans

Abstract

Computational modeling constitutes a fundamental extension to the psychological scientific toolkit. The present contribution aims to clarify the pros and cons of modeling techniques, using examples from language processing. We present some strategies that may help avoid potential pitfalls of the computational modeling approach. The traditional relationship' between theory and experimentation in psycholinguistic research is also considered as well as some limitations associated with the standard experimental approach. Finally, we insist upon the complementarity between theory, experimentation and modeling.

CONTENT, Alain, FRAUENFELDER, Ulrich Hans. On the need for computer modeling: The case of language processing. Psychologica Belgica, 1996, vol. 36, no. 1/2, p. 113-144

Available at:

http://archive-ouverte.unige.ch/unige:83829

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

ON THE NEED FOR COMPUTER MODELING: THE CASE OF

LANGUAGE PROCESSING

Alain Content y

and Uli H.Frauenfelder y

Laboratoire de Psychologie Experimentale, Universite libre de Bruxellesy Laboratoire de Psycholinguistique Experimentale, Universite de Geneve

abstract

Computational modeling is a fundamental extension to the psychological scienti c toolkit. The present contribution aims at clarifying the pros and cons of modeling techniques, using examples from language process- ing. We present some strategies that may help avoid potential pitfalls of the computational modeling approach. The traditional relationship between theory and experimentation in psycholinguistic research is also considered as well as some limitations associated with the standard ex- perimental approach. Finally, we insist upon the complementarity be- tween theory, experimentation and modeling.

1.

Introduction

A few years ago, during a lunch discussion, Paul Bertelson came up, in his usual perceptive and provocative manner, with something like \what is all this fuss about computational psychology? Isn't that what we have been doing all the time since cognitive psychology was launched? Then what is the dierence?" This paper is an attempt to answer Bertelson's questions. We argue that the de ning feature of the computational approach is the use of computer simulation techniques to develop models of processing systems, and we suggest that, within an information-processing framework, such tools constitute a natural complement to experimental procedures.

Correspondence should be addressed to Alain Contentneedstheaddresshere. The writing of this paper has beneted from support from the Swiss National Fund for Scientic Research (grant11-39553.93)and Belgian National Fund for ScienticResearch (grant9.4565.92). We thank Daniel Holender, Guy Lories, and Dominic Massaro for their comments on a previous version, and Axel Cleeremans for stimulating discussions and encouragements.

(3)

Most psychologists interested in the study of human perception and cognition share the fundamental assumption that mental activity can be described and an- alyzed as the functioning of a particular kind of physical machine. Most authors would also argue that descriptions of information processing mechanisms provide explanations of perception, cognition and human behavior. On the other hand, the de ning feature of the computational approach is the recourse to computer techniques as tools to model or simulate information processing systems through numerical or symbolic computation. Thus, in that context, we believe that Bertel- son was right in pointing out the natural liation between the basic assumptions of cognitive psychology and the recourse to computer modeling. As far as cogni- tive psychology's agenda is to explain mental life through descriptions in terms of information processing mechanisms, there may not be any principled disagreement between that programme and the computational approach. In fact, as often noted, the information-processing framework is largely derived from the application of the computer metaphor to mental activity. So, within the information processing ap- proach to psychology, whether one appeals to computational modeling or not may be more a matter of research strategy than a matter of principle.

Still, not all current metatheories in psychology seem to stick to the principle of mechanistic explanation, and several well-known scholars have expressed concerns about the potential and limitations of the information-processing approach as an answer to psychological inquiry (e.g., Neisser, 1976 Norman, 1980). For instance, in a recent paper entitled \Has psychology a Future?", Eleanor Gibson (1994, p.70) claims

When someone asks me (as they quite often do), \But what is the mech- anism?" my answer is that I am not a mechanist and I do not believe in separation of mental processes and action.

We believe that such disagreements may be more apparent than real and may re- sult from an unduly restricted notion of mechanism. In our view, the question \what is the mechanism?" entails little more than a prompt to provide explanations framed as answers to \How?" questions. While we agree in principle that some psycholog- ically relevant explanations may not need to take the form of responses to \How?"

questions, we assume that mechanistic descriptions of information processing pro- vide valuable explanations for several aspects of human cognition. Massaro and Cowan (1993) similarly argue that ecological realism, the physical symbol system hypothesis, connectionism, and the modularity hypothesis constitute four variants of a more general information processing framework, and we tend to agree with this analysis. We would add the dynamic system modeling approach (Port & van Gelder, 1995b) as one further variant. As a consequence, \computational" is used in this paper as a cover term including all frameworks using automatic computa- tion devices to simulate mental activity. Thus, our acception diverges from others', who use the term to refer more speci cally to the metaphor of the Von Neumann computer, excluding other approaches based on connectionist or dynamical systems.

(4)

Even within the circle of psychologists who adhere to the information processing framework, there is no clear agreement on the role of the computational model- ing enterprise in cognitive psychology. Some (Loftus, 1993) warn of the dangers of superpowerful tools leading to supercomplex models. Others seem to remain skep- tical or agnostic (MacKay, 1993) others still put great hopes in the contribution of computational modeling or even consider that computational models are a requi- site of psychological theories (Broadbent, 1987 Estes, 1993 Johnson-Laird, 1983 Johnson-Laird, 1988 Parisi & Burani, 1988). Our own feeling is that the relevance and the contribution of the computational approach in current cognitive research is often misunderstood. We believe that the modeling endeavor is a fundamental improvement to the psychological scienti c toolkit. So, one aim of the present con- tribution is to clarify the pros and cons of modeling techniques, with a particular focus on aspects of language processing. Another aim, without any intention of be- ing prescriptive, is to suggest some strategies that may help avoid potential pitfalls of the computational modeling approach.

In the rst section of the paper, we oer some general de nitions of the nature and function of theories and models within psychological science. In the second sec- tion, we reconsider the traditional relationship between theory and experimentation in psycholinguistic research, and we discuss limitations associated with the standard experimental approach. The nal section examines the contribution of computa- tional modeling to psychological inquiry, and underscores the complementarities between theory, experimentation and modeling.

2.

Theories and Models as Information Processing Explanations

Psychology, as any other science, aims at producing theories. To articulate in some detail the nature of the modeling endeavor and its relation to psychological theorizing, it is necessary to clarify the relationship between theories and models.

2.1.

Theories

A theory is a structured set of mental constructs (concepts, propositions, de ni- tions) that provides an explanation for some set of phenomena. According to the Encyclopdia Britannica, ascienti c theoryis a

systematic ideational structure of broad scope, conceived by the human imagination, that encompasses a family of empirical (experiential) laws regarding regularities existing in objects and events, both observed and posited. A scienti c theory is a structure suggested by these laws and is devised to explain them in a scienti cally rational manner.

Theories must explicitly and accurately describe the phenomena under consider- ation, but descriptive adequacy is, of course, not sucient: Theories are expected to explain things. The notion of \explanation," however, is not an easy term to de ne, and it is unclear what exactly scientists and philosophers mean by terms such as

\explain" or \understand." Intuitively, what counts as an appropriate explanation depends on a collection of criteria. To explain is \to make plain." In general, the

(5)

process of explanation can be viewed as presenting some phenomenon that we do not understand in terms of other concepts that we believe we understand, or which are deemed to be simpler, or maybe which are generally accepted.

Some examples may help clarify what constitutes an appropriate scienti c expla- nation. Newton's law of universal attraction is a prototypical instance of a useful theory in physics. It is (at least at a macroscopic level) descriptively correct, its formulation is extremely synthetic, and it is universal since it applies to an in nite number of situations. The law of universal attraction provides an explanation for a number of phenomena. Note, however, that Newton's laws are not the nal word:

As such, they do not oer any explanation of why bodies are attracted at a dis- tance. In other words, we do not yet fully understand the causal mechanism that determines the attraction between bodies.

Another example from a eld of psychology with which we are familiar, concerns the nature of the determinants of reading ability. Current theories of reading acqui- sition state that there is a causal relation between children's ability to understand spoken words as a sequence of phonemic segments and their later success in reading.

This claim is supported by several empirical observations that have been replicated often by dierent research groups in various languages (see, e.g., Morais, Alegria &

Content, 1987 Rohl & Pratt, 1995 Wagner & Torgesen, 1987 for reviews). We do not know, at this time, how children come to understand speech as a discrete string of segments, nor why some children fail to develop such segmental representations.

Yet, the notion of a causal relation between phonological awareness and reading acquisition has important implications, most notably in the domain of reading in- struction as well as for the prevention of learning diculties and the correction of reading disabilities.

Some authors apparently believe that the dierence between descriptive and explanatory adequacy is in terms of predictive power: A good theory should not only describe accurately what is already known, but should also be able to predict new phenomena. Pylyshyn (1984), for instance, argues that cognitive explanations must be stated in terms of cognitive vocabulary because this is the level at which useful generalizations and predictions can be made. Other researchers point out that predictive power in itself is not a sucient criterion and argue that explanatory adequacy depends on conformity to universal, independently motivated principles or constraints (e.g., Chomsky, 1965 Johnson-Laird, 1983 Seidenberg, 1993).

Both of the above examples, dierent as they may be, constitute statements of reliable and established regularities that may be empirically observed and veri ed.

Any phenomenon that appears as a logical consequence of such regularities would naturally be considered explained by them, although the regularities themselves are only descriptive. Ideally then, a complete theory should reduce phenomena to a limited set of general or even universal principles. However, the available explana- tions are often more limited in scope and, apparently, the \grand uni ed theory" of psychology is nowhere close to emerging. Whether such a theory is at all possible in psychology is even debatable. Thus, what seems crucial in accepting an explana- tion for some phenomena is that the phenomena appear as logical consequences of

(6)

independently motivated explanatory statements. We cannot aord to ignore this type of partial knowledge, which happens to be the rule rather than the exception in psychology. So the critical question regards how to pursue simultaneously the quest for the most general principles and the discovery of such local generalizations and explanations.

2.2.

Models

What is the dierence between a theory and a model? According to Merriam- Webster Collegiate Dictionary, a model can be

a miniature representation of something::: an example for imitation or emulation::: a description or analogy used to help visualize something (as an atom) that cannot be directly observed::: a system of postulates, data, and inferences presented as a mathematicaldescription of an entity or state of aairs.

These various de nitions all capture some of the uses of the notion of model in scienti c circles. Our own idea is that a model is a particular type of theoretical elaboration, which is most often expressed as a metaphorical device: Proposing a model of a system amounts to devising another system in such a way that its behavior will mimic relevant aspects of the target system behavior of the target system. This will allow to explain these properties of the target system by referring them to the relevant characteristics of the model.

Of course, models can be formulated in dierent guises, and need not be made of real stu, and one can think of various types of models: concrete ones (such as plastic models of complex molecules), symbolic or iconic ones (such as, within cognitive psychology, those expressed in the box-and-arrow-diagram-ow type of symbolism), mathematical models, or computer models. One could even extend the notion to verbal models, which would then be descriptions of imaginary devices (such as Morton's, 1969, notion of logogens).

We suggest that the essential dierence between theories and models is in terms of the conceptual media employed to express the ideas, but that this dierence in media also entails dierences in scope. Theories are generally expressed in the form of abstract principles of general applicability. When facing complex phenomena, in which multiple factors interact in intricate ways and vary dynamically over time, it may be dicult or even impossible to specify or imagine the behavior of the system from the general principles alone. Models serve the function of making the interplay of the general abstract principles more concrete and more accessible to our understanding, within a delimited domain. Thus, most psychologists would use the term \theory" to refer to the notion of spreading activation and would refer to Quillian's \model" of semantic memory. The notion of spreading activation refers to a general principle of how information ows, whereas Quillian's model applies that principle in one more circumscribed domain. Similarly, one can read about information \theory," and of its application to human attention in Broadbent's lter \model." In other domains, likewise, one can distinguish between general

(7)

laws and their applications. Meteorology, for instance, illustrates the gap between general physical laws and physical processes (such as heat radiation, convection or conduction) and their intervention in models of global climatic change, local weather, hurricane dynamics or the greenhouse eect. In the latter example, despite the universal acceptance of the general laws, it appears that most phenomena have not received a detailed and satisfactory model yet.

In sum, models can be seen as simpli ed representations of some parcel of real- ity. They are instantiations of general theoretical hypotheses, in a form that lends itself to more detailed investigation. By virtue of their analogical structure, they provide intuitive understanding of their object. Indeed, several psychologists study- ing human reasoning and inference have argued that much of our understanding in everyday life settings is based on the elaboration of mental models of the situation or problem domain. To quote Johnson-Laird (1983, p. 2):

the psychological core of understanding consists in your having a \work- ing model" of the phenomenon in your mind. If you understand in- ation, a mathematical proof, the way a computer works, DNA or a divorce, then you have a mental representation that serves as a model of an entity in much the same way as, say, a clock functions as a model of the earth's rotation.

How do theories and models come into play in psychological research? One basic tenet of modern cognitive psychology is the belief that interesting explanations are to be found in the understanding of the mechanisms at work. Palmer and Kinchi (1986) have tried to identify the fundamental assumptions of the information pro- cessing framework from a psychological viewpoint. They consider ve assumptions, of which two seem more directly relevant in the present context. These are the assumptions they call informational description, andrecursive decomposition. The principle of informational description states that mental phenomena can be de- scribed as informational events, consisting of three parts: the input information, the operation performed on the input, and the output information. The principle of recursive decomposition states that

any complex (i.e., nonprimitive) informational event at one level of de- scription can be speci ed more fully at a lower level by decomposing it into (1) several components, each of which is itself an informational event, and (2) the temporal ordering relations among them that specify how the information \ows" through the system of components (p.39).

Flow diagrams have been used abundantly in the form of box and arrow represen- tations. they provide a compact description and a clear decomposition of a process into a sequence of stages. This strategy of process decomposition has received much attention in some domains (especially when strong interactions with neuropsychol- ogy were possible). In fact, in some areas, the strategy of process decomposition has been so inuent that the issue of componential architecture became for some time a major focus of the research eort, culminating in the modularity hypothesis.

(8)

The endeavor is nicely illustrated by the following quotation from the physicist Lord Kelvin (cited by Johnson-Laird, 1993):

I never satisfy myself until I can make a mechanical model of a thing. If I can make a mechanical model I can understand it. As long as I cannot make a mechanical model all the way through I cannot understand ::: Lord Kelvin's saying could be taken as a motto for cognitive psychology. In- deed, one underlying driving force to current research must be the belief that we will reach some understanding of mental life and behavior by analyzing percep- tion, recall, language and reasoning as information-processing mechanistic systems.

For instance, MacKay (1988 1993) contrasts two research strategies in psychology, which he labels the empirical and the theoretical epistemology respectively. The mission assigned to science by an empirical epistemology is to gather a body of re- liable facts and regularities, whereas for a theoretical epistemology it is to develop theories that explain available facts. MacKay attributes the unsatisfactory state of advancement of knowledge in psychological research to over-reliance on the em- pirical, result-centered strategy. He thus appeals to a more theoretically oriented research strategy in psychological science, and claims (1993, p. 237) that

the sine qua non of theories within the theoretical epistemology is mech- anistic explanation: Theories are not just descriptive, but explain phe- nomena in terms of underlying mechanisms.

Therefore, it comes as no surprise that most current theorizing is about models.

In a sense, the program of modern cognitive psychology could be seen, or even de ned, as the project of modeling mental activity. So, why not use the best tools available?

3.

Verbal models and data

One widely accepted strategy in science is empirical falsi cation. Thus, in principle, one should start with a theoretical hypothesis (induced from preliminary observa- tions or inferred from established results), and generate an empirical prediction, as a relation between one or several dependent variables and one or several independent variables. Then, one would design an experiment manipulating the independent variables and monitoring the eect on dependent variables. According to the falsi- cation principle, the interesting case is when the data do not t with the theory, since this should trigger its revision or its rejection. In short, science would make progress through negative feedback.

Within cognitive psychology, many authors have pointed out the limitations of the falsi cation strategy (MacKay, 1993 Newell, 1990). Moreover, attempts to conform to the falsi cation precepts have generally resulted in disappointment and disillusion. This state of aairs may be attributed to three dierent factors: the complexity of the phenomena under scrutiny, the lack of speci cation of verbal models, and the general issue of model identi ability.

(9)

3.1.

Complexity of phenomena

The eects upon modeling of the complexity faced in characterizing language pro- cessing can be illustrated by examining the history of models of spoken word recog- nition. The originalcohortmodel (Marslen-Wilson & Welsh, 1978) represents the rst attempt to provide a systematic description of spoken word recognition. The model assumes two successive stages of processing. During the rst, all words that exactly match the onset (i.e., the initial one or two segments) of the target word are activated, thus creating a set of competitors which constitute the initial cohort of the target. This initial activation phase is followed by a stage of deactivation during which the cohort members that do not match later sensory input are elimi- nated from the cohort. The number of cohort members decreases as more stimulus information becomes available. This model makes precise predictions about the moment at which any word can be recognized in a given lexicon from an analysis of its cohort members. The recognition point is assumed to correspond to the word's uniqueness point, or the moment that the word becomes unique with respect to all other words in the lexicon. A given target word spoken in isolation is assumed to be recognized when it is the only word remaining in the cohort. For example, a word like \elephant" heard in isolation is predicted to be identi ed at the sound /f/, since there are no other words in the lexicon sharing the initial sequence /elef/.

By its clarity and simplicity,cohort I generates precise predictions about the time-course of recognition: Recognition should be a linear function of the position of the uniqueness point. The fact that these predictions can be tested and falsi ed makes the model attractive. However, some critics were quick to point out various ways in which this simple description fails to account for the robustness of human language perception (e.g., Norris, 1990).

To incorporate some psychologically more realistic assumptions, Marslen-Wilson (1987) proposed a new version of his model,cohortII. This model appeals to the notion of level of activation to express the varying degree of match possible between the input and dierent lexical competitors. Cohort members vary in activation as a function of their t with the input but also as a function of their frequency. While the status of words in the original model is binary (either in or out of the cohort), in the new formulation of the model cohort membership is a matter of degree. Still, the model does not specify how the frequency of words and their degree of match with the input determine activation.These factors and their relative contribution to lexical activation cannot be quanti ed in a verbal model, so no precise de nition of the competitor set is yet available incohortII.

The preceding discussion of the two versions of thecohort model illustrates a general dilemma confronting eorts to model lexical processing. cohort I makes clear and testable predictions, but at the price of several simplifying assumptions.

In contrast, cohortII is a more complex verbal model and presumably ts better with what we know about lexical processing. However, it does not provide direct answers to the questions concerning the competitor set and therefore cannot predict the time-course of word recognition.

(10)

We take the lesson to be the following. Simple, verbal models are helpful in shaping and formalizing questions and issues: As far as they capture the major dimensions of the problem, they provide a good account of it. However, psycho- logical phenomena are generally aected by a large number of variables, and lan- guage performance is no exception. Some factors that must be dealt with in word recognition research include the form properties of words (quality of sensory input, length, phonological structure, etc.), the grammatical and abstract properties of words (syntactic form class, semantic category, word frequency and morphological structure) and the properties of the lexicon (number of competitors, form proper- ties of competitors, grammatical and abstract properties of competitors). Most of these factors cannot be adequately taken into account using dichotomous categories, and psycholinguistics is often faced with intricate interactions and covariations of multiple factors.

Because verbal models are intrinsically limited in their ability to describe the inuence of multiple factors and their interactions, they lead to inappropriate sim- pli cation for the sake of prediction. Simpli cation takes two forms: limiting the number of factors taken into account, and treating the factors as dichotomous rather than multi-valued.

One important consequence of these characteristics is the introduction of a bias toward an analytical methodology. Typical experimental designs manipulate only a small number of independent variables (often de ned in a binary fashion) and attempt to control or neutralize other potential factors. While this analytical ap- proach may be appropriate at a rst stage in experimental research, it clearly fails to deal with the full complexity of cognitive phenomena. As we will argue below, the addition of computer simulation helps overcome this hurdle.

Another unfortunate consequence of the approach is that it leads to local theo- rizing, and (in MacKay's, 1993, terms) empirical theorization which is data driven, and domain speci c. There are several risks to local theorizing. One is the lack of integration of the research. This is abundantly illustrated in the often blamed par- adigm driven research strategy. As pointed out by MacKay and others, miniature models designed to account for a small number of results have proliferated rather than merged into a single general theory (MacKay, 1988 Newell, 1973 Norman, 1980). Moreover, because local theorizing proceeds in isolation, a further danger is that it rarely refers to general principles and thus remains inherently descriptive or at best, weakly explanatory. Accounts of phenomena such as lexical decision perfor- mance provide a good example of this. Lexical decision has mainly been studied as a speci c language task, rather than as an example of a binary decision task applied to the domain of language, and thus without consideration of what is known about the mechanisms of binary decision tasks in general.

3.2.

Underspeci cation

A related diculty is that verbal models and information ow diagrams most often leave many details unspeci ed. The focus on the global architecture induced by the functional decomposition strategy has generally resulted in insucient explicitness

(11)

in the description of both the nature of representations at each stage, and the pro- cessing mechanisms operating from one stage to the next. Many models of visual word recognition proposed in the last twenty years could be taken as illustrations of that limitation: Word recognition is decomposed in a sequence of transcoding operations, which are only speci ed in terms of their input-output relations. Little is known about the transcoding processes themselves. This feature is epitomized in Neisser's (1976) caricature of an information processing model of perception, in which the three successive boxes are labeled \processing," \more processing," and

\still more processing," respectively. Even when the nature of the representations or the transcoding operations are more clearly speci ed, one important dimension that is not explicitly handled is the dynamic characterization of the processing|

particularly for chronometric data (see Parisi & Burani, 1988, for further discus- sion). For instance, despite the large impact of the dual route model of visual word recognition, the nature and the time course of grapheme-phoneme conversion have never been explicitly included in the formulation of the verbal models, and this has prevented attempts to disentangle the dual-route model from competing lexical analogy accounts. Thus, the lack of attention to the detailed picture often makes it hard to generate predictions that can be put to empirical test.

3.2.1. System Identi ability

Even if fully speci ed verbal models were available, however, they would not be immune to a third problem, that of model or system identi ability. The notion of system identi ability is discussed in some detail by Massaro and Cowan (1993).

It was introduced by Moore (1956) in the context of formal automata theory, and refers to the problem of describing the inner workings of a machine when only its input and outputs are available. Moore demonstrated that any input-output map- ping can be reproduced by many dierent automata, so that it would in general be impossible to uniquely identify the processing mechanism underlying some set of input-output pairs. Applied to psychological research, this claim seems to strongly undermine the information processing enterprise. However, as Massaro and Cowan aptly point out, there is only a partial similarity between the problems addressed by psychological inquiry and formal automata theory. One dierence is that psy- chological investigation does not need to restrict its observations to the inputs and outputs of a processing component. It can extend the database by considering other measures of performance, such as chronometric data, neuropsychological or developmental observations.

Furthermore, one can add other constraints on the space of potential models by taking into account formal conditions such as simplicity and parsimony (see Jacobs

& Grainger, 1994 for an extended discussion), external sources of evidence, such as neural limitations or neuroanatomical characteristics, or general principles of processing. Whether such external constraints will ever be sucient to solve the issue of model identi ability is perhaps a matter of faith. However, one important consequence to which we will return later is that external constraints are crucially needed to restrict the set of admissible models.

(12)

4.

The computational approach

4.1.

A de nition

Appealing again to the Encyclopdia Britannica, acomputer simulationrefers to the use of a computer to represent the dynamic responses of one system by the behavior of another system modeled after it. A simulation uses a mathematical description, or model, of a real system in the form of a computer program. This model is composed of equations that duplicate the functional relationships within the real system. When the program is run, the resulting mathematicaldynamics form an analog of the behavior of the real system, with the results presented as data.

We will restrict ourselves to a general discussion of the advantages, potential drawbacks and limitations of the modeling approach. We will not elaborate on the issue of model evaluation, which has been discussed recently by others in psycholin- guistic research (see, e.g., Dijkstra & de Smedt, 1996 Jacobs & Grainger, 1994).

Nor will we enter here into the debate about which particular modeling framework (e.g., symbolic, connectionist, distributed) is preferable or optimal.

Computational modeling refers to the use of computer programs to simulate some set of phenomena. Two bene ts to the use of computer modeling in psychology are often mentioned. One is the requirement of full speci cation of the process under consideration, and the other is the model's ability to deal with empirical complexity.

In view of the limitations of verbal accounts that we have described in the previous section, these advantages are important, and deserve further discussion.

These bene ts are particularly relevant within a perspective on model construc- tion in which the designer starts with a verbal model and aims at implementing it as a computer program. However, we think that there is more to the modeling enter- prise and that this restrictive vision of modeling-as-theory-implementation severely limits the bene ts we can expect from the modeling endeavor. Borrowing partly from an analogy previously proposed by McCloskey (1991), Jacobs and Grainger (1994) describe two strategies for model construction that appeal to two dierent professions: the architect and the gardener. The architect starts from an explicit (verbal) theory of the target function, and implements it in a computational system.

The gardener's strategy consists of

growing a model or network that mimics in some respect a human cog- nitive function, without necessarily having an explicit theory of that function (p. 1327).

In recent years, the gardener's strategy has become more feasible and promising, thanks to the availability of powerful automated learning algorithms in various elds of computer science, such as arti cial neural networks, symbolic manipulation systems (see, e.g., Ling & Marinov, 1993), or probabilistic systems such as hidden Markov models.

Yet, it would probably be misleading to associate the gardener's approach too closely with the use of arti cial neural networks, or even with the deployment of

(13)

automatic adaptive procedures. The architect and the gardener are two extremes on a continuum ranging from a strict implementation strategy to a mere data tting strategy. It seems likely that every architect is endowed with a bit of the gardener's art, and that every gardener secretly entertains a sketch of its accomplishment. In other words, interesting modeling work involves elements from an intentional and theoretically-based design, but also unexpected features that emerge from the in- terplay of the assembled mechanisms. There are many examples in the history of Arti cial Intelligence illustrating how unforeseen consequences arise from computa- tional implementations.

In the remaining part of this section, we rst discuss the three issues identi ed earlier from the architect's perspective, namely, detail speci cation, complexity, and system identi ability, and then continue by developing the speci c issues that may arise from adopting the gardener's point of view.

4.2.

Detail speci cation

Designing a running model of a given set of phenomena obviously forces the modeler to ll the details missing in the verbal theory, and this immediately pays o by permitting detailed, quantitative tests of predictions derived from the model's actual behavior. However, xing the details to transform an abstract scheme into a working system is not easy. As any architect would know, the nal appearance of the work may depend on the wallpaper choice as much as on the initial blueprints. Similarly, in creating a computer model, designers will encounter many unsettled issues and their decisions|even totally arbitrary ones|may have a crucial inuence on the performance of the system.

In this regard, it is instructive to examine the evolution from the verbal formula- tions ofcohortI to a related computational realization, thetracemodel. There is a direct liation between these two models, as the following quotation testi es:

Although the cohort] model is vague and fails to address many impor- tant issues, it is attractive enough so that we have used it as the basis for our initial attempt to build an interactive model of speech perception.

(Elman & McClelland, 1984, p. 349).

It is thus interesting to examine how the models diverge from each other, and to establish what the constraints are that play a role in the implementation process.

traceis an interactive activation model made up of distinctive features, phonemes, and word units that represent hypotheses about the sensory input. These three types of units are organized hierarchically (see Figure 1). There are bottom-up and top-down facilitatory connections between units on adjacent levels (feature- phoneme, phoneme-word, and word-phoneme) and inhibitory connections between units within levels (feature-feature, phoneme-phoneme, and word-word). Incom- ing sensory input provides bottom-up excitation of distinctive feature units which in turn excite phoneme units. Phoneme units are activated as a function of their match with the activated distinctive features so that several alternative phonemic

(14)

Figure1. A sketch of thetracemodel of spoken word recognition.

units are activated for a given input. As the phonemes become excited, they in- crease the level of activation of words that contain them. As words receive some activation, they begin to inhibit each other. In addition, as words become activated, they also excite the phonemes that they contain in a top-down fashion.

trace diverges from the cohort model in its assumptions concerning infor- mation or activation ow. These assumptions are derived from the principles of interactive activation models. Unlike cohort, trace includes both lateral inhibi- tion between word units and top-down activation from the word to the phoneme level. By the lateral inhibition mechanism, the target word inhibits its competitors, but is also inhibited by them. The degree to which one word inhibits another de- pends on the former's activation level: The more activated a word is, the more it can inhibit its competitors. The dynamics of interactive activation and, in particu- lar, this lateral inhibition of competitors allowstraceto keep the actual activated competitor set small and to converge on a single lexical entry despite the mass of lexical candidates that contend for recognition. According to the top-down acti- vation mechanism, activated words provide top-down excitatory feedback to the phoneme units they contain by increasing the latter's level of activation. These phoneme units can in turn excite the connected word units.

The sequential and continuous properties of speech create a major challenge for computational models like trace. Indeed, since words can, in principle, begin at

(15)

any point in the signal,tracemust be able to represent every lexical candidate for each incoming input segment and to assign these candidates a position in the signal.

traceproposes that time is represented spatially. For each time-slice, it constructs a complete network in which all the units at every level are represented. Thus, to recognize an input made up of four phonemes, traceconstructs at least four (in fact, 4 6 since each phoneme extends over 6 time-slices) complete lexical networks and retains the time cycle at which each lexical unit begins. This solution of spatial reduplication is neither psychologically realistic nor ecient, as was pointed out by Norris (1990) who suggested that an alternative solution to the problem of representing time is provided by recurrent networks (see also Content & Sternon, 1994).

We can thus distinguish three essential sources in the elaboration of thetrace model: the pre-existing cohortmodel, the general assumptions derived from the interactive activation framework (graded activation, cascade processing, lateral in- hibition, top-down excitation), and implementation constraints (i.e., the particular way the whole network is reduplicated to account for the time dimension).

Some implementation decisions on whichtraceis based have directly inuenced the course of empirical research and have contributed to launch new issues or to reshape existing ones (see Frauenfelder, 1996 for a discussion). For instance, the role of lateral inhibition has led researchers to explore the nature and inuence of lexical neighbors on auditory word recognition. Similarly, the reduplication of the network in time makes it possible to investigate the processing of continuous sequences of words, and has attracted attention to the issue of lexical segmentation and to the processing of words embedded in longer words (see Frauenfelder & Peeters, 1990).

Other examples abound. There is a similar liation between the interactive activation model of visual word perception (McClelland & Rumelhart, 1981) and Morton's logogen model (1969):

Our model also draws on earlier work in the area of word perception.

There is, of course, a strong similarity between this model and the lo- gogen model of Morton (1969). What we have implemented might be called a hierarchical, nonlinear, logogen model with feedback between levels and inhibitory interactions among logogens at the same level. We have also added dynamic assumptions that are lacking from the logogen model (McClelland and Rumelhart, 1981, p. 388).

Yet, the two models have largely diverged in their inuence on subsequent re- search. The logogen model has essentially inspired discussions in the neuropsy- chological literature about the componential architecture of the lexical function, leading to a multiplication of speci c subsystems (Ellis & Young, 1988 Morton, 1980). The interactive activation model, besides its adoption in various areas of language and cognitive processing, has promoted renewed interest on more micro- scopic issues about lexical processing, such as the inuence of lexical neighbors in the recognition process.

(16)

Design decisions can be motivated by dierent concerns ranging from general the- oretical postulates, empirical ndings, epistemological considerations (such as Oc- cam's razor principle), to pragmatic constraints such as expediency and eciency.

As we have argued previously, it is always the case that neither the preexisting verbal theory nor the empirical database fully determines the model. This poses a problem in that pragmatic constraints may lead to decisions that are theoretically unmotivated, arbitrary, or ad hoc. One common problem involves representational choices in connectionist modeling. As noted by Dijkstra and de Smedt (1996), present empirical techniques provide scanty information regarding the format of mental representations. Thus, model designers are forced to refer to other con- straints. For instance, the use of \wickelgraph" and \wickelfeature" representations in Seidenberg and McClelland's (1989) distributed model of visual word recognition and Rumelhart and McClelland's (1986) model of past-tense acquisition was partly guided by design considerations. In both cases, the authors acknowledged that their choices were meant to facilitate generalization, given other known characteristics of the connectionist framework adopted.

Critics and skeptics have been quick to question the role of such implementation choices in shaping the models' behavior. If, as some asserted (Bever, 1992 Lachter

& Bever, 1988), these trics (\The Representations It Crucially Supposes") are primarily responsible for the models' successes, the interest of the demonstration is strongly undermined. More recent simulation work by Plaut, McClelland, Sei- denberg and Patterson (1996) indeed suggests that the nature of the orthographic and phonological representations has a direct inuence on the model's ability to generalize.

Two potential strategies may help clarify the extent to which the behavior of models depends on theoretically irrelevant implementation details. One is to test the robustness of the behavior across variations of implementation details. An example of this approach is provided by Plaut and Shallice's (1993) simulations of deep dyslexia, in which the authors carefully showed that the main behavioral characteristics resisted variations in network topology, sites of lesion, and training algorithms. A complementary approach is to abstract away the general design principles that are operating and which account for the functional characteristics of the realized model (Stone & Van Orden, 1994 Van Orden & Goldinger, 1994 Van Orden, Pennington & Stone, 1990).

One question that may arise from the previous discussion is whether the modeling endeavor is worthwhile, given the apparent insuciencies of the empirical database.

Should we not wait until we know enough? Our answer is to turn the claim the other way around: We believe that the modeling enterprise provides an important side bene t, besides the immediate outcome of having a running computer model. By facing the constraints of implementation directly, modelers are forced to identify theoretical issues that might otherwise be overlooked. If we do not face these implementation constraints, we may remain ignorant of our own ignorance.

(17)

4.3.

Complexity

Human behavior unfolds in time and is subtly sensitive to a huge number of fac- tors. It thus seems natural to resort to dynamic systems to describe, formalize and simulate the complex interactions that determine the observed phenomena. Indeed, other sciences which share some of the same characteristics, such as economics or meteorology, gradually moved to computer modeling when hardware and software of sucient power became available.

Within psychology, a similar move is occurring and modeling techniques appear more and more as the appropriate interface between theoretical formulations and empirical observations. For instance, in a recent introductory paper on mathemati- cal models in psychology, Estes (1993) notes: \Models are essential to set the stage for tests of hypotheses about theoretical concepts." Furthermore, he adds (p. 9{10),

We are dealing with complex systems in which processes or mechanisms do not exist alone. ...] Models are also essential to the analysis of complex situations. In psychological research, we are always dealing with complex systems in which any observed behavior can be the resultant of many dierent, and often interacting, causal factors. Thus the outcomes of experiments can only be interpreted by comparing what is observed with what was expected from some simpli ed view of the situation, that is, a model.

What appears as one major achievement of computer models is (or should be) the generation of precise and detailed predictions encompassing rich ensembles of factors from a simple and limited set of assumptions. Besides the obvious precision gain (which may not be in itself the most interesting feature, given the limitations of empirical techniques), we see two more important improvements that depend upon the availability of more realistic simulation models. In short, we argue that simulation models may provide a partial solution to the limiting inuence of the analytical bias in empirical research and to the ubiquitous problem of observational fragility.

4.3.1. Avoiding analytic bias

The power of current computing technology makes it possible to develop mod- els which apply to relatively large bodies of stimulations. In recent years, many published simulation studies have incorporated realistic stimulus sets.When mod- els compare adequately in scale, one immediate consequence is the possibility of comparing simulation and empirical results at the most detailed, ne-grained level.

The availability of real scale models makes it possible to obtain estimates of sim- ulated performance for large sets of words, and thus to transcend some limitations associated with the standard factorial design in experimental research. Indeed, in the recent years, an increasing number of research teams have begun to augment the standard experimental methodologies with studies using much larger stimulus sam- ples and multivariate statistical analysis techniques (Seidenberg, Plaut, Petersen,

(18)

McClelland & Patterson, 1994 Treiman, Mullennix, Bijeljac-Babic & Richmond- Welty, 1995).

These methods nicely complement the more traditional approach. First, they provide a welcome relief to those enduring the torturing task of searching for ap- propriate language stimuli varying along many selected dimensions and controlled for even more other dimensions (Cutler, 1980). Second, they go beyond facto- rial manipulations in handling the combination and interaction of factors that are characteristic of the real world. Moreover, when combined with appropriate simu- lations, they provide extremely powerful tools to assess the ne-grained adequacy of the model.

4.3.2. Observational fragility

Broadbent (1987) argued that small-scale computational models may oer a re- sponse to what he calls \the problem of observational fragility," that is, the fact that a minimal variation in task demands or experimental conditions can drasti- cally modify the outcome of the experiment, leading researchers to question the generality of their accounts. Broadbent further suggested that this state of aairs is primarily due to the use of theoretical terms that are too imprecise and that do not capture the details of the experimental conditions or do not allow direct and explicit comparisons between predictions and observations. He illustrated the point by showing how a simple random walk model could account for the four typical result patterns observed in visual and memory search experiments, through limited variations of the model's parameters. McClelland (1988) reported another illustra- tive example showing how the recourse to simulation with the interactive activation model helped reconcile ndings that previously appeared contradictory.

Interestingly, observational fragility or variability may be a problem for computer systems as much as it is a problem for experimentation. We recently experienced such diculties in simulating the experimental results of a set of studies devoted to examininghow the presence of an initial minimalmismatchin an auditory word (i.e.,

\shigarette," \focabulary") aected recognition. Our ndings (Frauenfelder, Con- tent & Scholten, 1995) suggested that such a minimal deviation did not prevent the activation of the target word. We then set up simulations to assess whethertrace could account for the observations. Unfortunately, the implementation characteris- tics of the model (only a subset of the phoneme inventory of English is available) prevented us from using exactly the same stimuli as in the experiment. However, it was possible to run a \simulation experiment" that was close to the human sit- uation. One intriguing result of the simulation was that the ability of the model to recover the intended word despite a minimal deviation was far from clear and varied to a large extent as a function of several factors.

With the original lexicon (which contained only about 200 words) and the param- eter set provided by the authors, simulations con rmed thattracecould recognize a fair proportion (75%) of stimuli with one feature onset mismatch. However, this nding was not replicated for a larger lexicon (approximately 1000 words) for which recognition performance on minimal onset deviations plummeted to below 25%.

(19)

This result is quite unexpected since part of the original justi cation for the model (McClelland & Elman, 1986) was its supposed ability to activate words despite minor initial mismatches (as in the \shigarette" example). In addition, when the parameter controlling the top-down feedback from word to phoneme was turned o, the recognition rate for the original but especially for the mismatch stimuli improved considerably with the larger lexicon. Nonetheless, the words were still recognized relatively poorly with mismatching inputs (about 50%for minimal mis- matches). The results suggest that, contrary to what is generally believed,trace does not reliably recognize words with minimal mismatches. Limited recognition of such stimuli can only be achieved at the expense of the key mechanism of top-down feedback required to account for lexical eects at the phoneme level.

This example drawn from our current research illustrates several issues. One is the problem of scaling. Because the behavior of the system depends in complex ways on its database, there is little guarantee that properties observed with a limited lexicon will generalize to a larger, more realistic one. Note however that the only way to assess the inuence of corpus size is to explore it directly through simulations, and this, obviously, is only possible when a computer model is made available.

Second, it is extremely interesting thattracedisplays variability in its ability to recover from minimal mismatches. One could, of course, wonder whether this pattern corresponds to variability observed in human subjects. One way to answer that question would be to directly compare the performance of the computer system with the human data across stimuli, and to examine the t on a point-to-point basis.

Unfortunately this cannot be done with the current version of trace. Another approach is to consider the computer system as an object of study in itself, and to use experimental and statistical techniques to identify the factors that explain the observed variability in its behavior.

Frauenfelder and Peeters (1990) appealed to quantitative lexical analyses to un- derstand the behavior of trace. Their objective was to nd the members of the activated lexical competitor set and their inuence on the time-course of word recog- nition intrace. They tried to determine how the simulated recognition durations for a set of words could be predicted by dierent de nitions of the competitors of these words (for example, candidates matching the input exactly or those with a small mismatch in their onset like those in the experiments just described). The results show that competitors that match and are aligned with the target input, the cohort competitors, play the dominant role in determining the time-course of word recognition. Words with mismatching onsets did not aect the recognition time-course.

This approach of relating the simulation results to quantitative analyses of the properties of the lexicon gives the researcher some leverage to pry open the black- box and to understand the model's behavior. Indeed, althoughtracecan generate activation curves and word recognition latencies for each word in its lexicon, it is still dicult to understand how it produces these results and to predict the outcome for a new input. As we have seen, the model often shows unexpected patterns of behavior. Part of the diculty lies in understanding the complex interaction

(20)

between the processing mechanisms (bottom-up activation, lateral inhibition and top-down activation)postulated by interactive activation theory. In this context, simulation models lead us well beyond the exercise of formalizing and implementing a verbal theory. Computer models are also of great heuristic value. As proposed by McCloskey (1991), they give us the equivalent of concrete animal models which allow further exploration, permit identi cation of neglected factors, lead to new research questions which deepen our understanding of the target cognitive system.

4.3.3. Locus of complexity

Loftus (1993) expressed the concern that the availability of extremely powerful automatic computation resources would deter researchers from the quest for general and simple principles. There is unanimous agreement that theories must be simple and general. However, the notion of simplicity is itself far from transparent, and there is no accepted scale to evaluate the simplicity of a theory or a model (but see Jacobs and Grainger, 1994, for some suggestions). Furthermore, simplicity, as a feature of the description of the system (human or arti cial) should not be confused with simplicity as a characteristic of the system's behavior. Anybody who has ever approached dynamic system theories, chaos or fractals is aware of the paradoxical complexity associated with extremely simple mathematical functions.

The complexity of the phenomena that we are studying is a feature that we can enjoy or deplore, but we can do nothing to change it. As noted by Seidenberg (1993), the issue is far from new in psychology. To quote from a classical source (Miller, Galanter & Pribram, 1960): \No benign and parsimonious deity has issued us an insurance policy against complexity" (p.182). By contrast, the use of simula- tion tools that embody simple mechanisms while producing complex behavior, the familiarity with their functioning and the analytic understanding of their properties is most likely to generate insights leading to simpli ed accounts.

4.4.

System Identi ability

We have mentioned previously the general problem of system identi ability: Any input-output mapping is compatible with an in nite equivalence class of algorithms.

This raises the possibility that the whole enterprise of developing process models (be they verbal or computational) of cognitive abilities is futile and doomed to undecidability, unless cognitive science can provide further constraints that reduce search space. How do we choose between models that appear equivalent as to descriptive adequacy?

A partial response is that theoretical models should be preferred not only based on their descriptive adequacy, but also in view of other characteristics, such as their simplicity, scope, generality, heuristic value, and conformity to general principles.

Another element of response to this diculty is the observation that theories may be confronted with a rich empirical database, including other measures than input- output pairings. In most research areas in cognitive psychology, chronometric data are available, and could be used to assess the validity of theoretical models. As many authors have noted, most verbal models cannot predict latency patterns directly.

(21)

Parisi and Burani (1988) argue that most verbal models are static, because they rarely specify the ne-grained operations of the hypothesized components. At the very best, they make predictions on nominal (the regularity eect in visual word naming) or ordinal scales (the frequency eect, see Jacobs and Grainger, 1994), although the dependent variable used is based on a ratio scale. In contrast, certain computational models1can predict mean latencies at the level of an interval or ratio scale and, if they involve some stochastic component, they might even be used to account for variations in distributions (see Grainger & Jacobs, in press).

Models of a wider generality are now appearing, that handle not only the nal state of a particular cognitive ability but also its development, its inter-individual uctuations, and its pathological deterioration. As discussed above, thanks to their speci cation of implementational details, process models can be confronted with a much richer set of observations and be submitted to more stringent empirical tests.

Finally, one additional source of constraints that may help limit the search space is the appeal to a limited set of computational principles that de ne a meta- theoretical framework (or a scienti c paradigm) for information processing theo- ries. An example of such a set of general principles is delineated by McClelland (1993 see also Plaut et al., 1996) under the acronym of \grain" (Graded Random Adaptive Interactive Nonlinear) networks. Other principles central to this approach involve the notions of distributed representations and distributed knowledge. Obvi- ously, neither this particular set of statements nor any other is currently universally adopted or even accepted by the scienti c community. Shouldn't we then rst focus on the abstract general principles, such as the componential structure of the system, the characteristics of information ow, or the nature of computational primitives and representations rather than building detailed models and thereby incurring the risk of getting lost in a forest of implementation details?

The trouble is that it may well be impossible to evaluate the validity of principles in isolation. In discussing the psychological motivations of each principle, McClel- land (1993) insists on their interdependence, and a similar argument was made by Newell (1973), in his twenty-question paper. Besides, such general abstract com- putational principles cannot be subjected to the empirical test of the falsi cation strategy. Rather, as argued by MacKay (1993), among others, the fundamental as- sumptions that de ne a theoretical framework emerge gradually and gather support through their repeated successes in generating simple, elegant and appropriate ac- counts of speci c cognitive and linguistic processes they are eliminated only when an alternative set of principles becomes available.

A useful illustration of this process comes from the debate between supporters of the connectionist framework and partisans of the symbolic approach concerning the

1Port and van Gelder (1995a) argue that computationalmodels based on the symbol manipula- tion paradigm are intrinsically incapable of predicting the temporal course of processing, because

\they leave time out of the picture, replacingit only with ersatz `time': a bare sequence of symbolic states." Latency predictionsare usually obtained by some transformationof responseprobabilities.

By contrast, dynamicalmodels describing how the state of the system evolves in time appear most appropriate to account for reaction time data.

(22)

acquisition of morphology. Critics of the initial simulation study have pushed the conclusion that the connectionist approach was in principle unable to account for the facts of language acquisition. Yet further research (MacWhinney & Leinbach, 1991 Plunkett & Marchman, 1989 Plunkett & Marchman, 1990) has shown that none of the criticisms was beyond the reach of connectionist techniques. Although it is still unclear which of the current approaches has the best chances of providing the most accurate and parsimonious account of morphological acquisition, rejecting the whole frameworkbecause of the inadequacy of a particular instantiation is logically unsound.

Another example can be found in an ongoing controversy about the eect of word context upon phoneme processing. Massaro (1988, 1989b) observed that the interactive activation model incorrectly predicted an interaction between phoneme and context information,because of the feedback connections from word to phoneme units. He thus concluded that the interactivity assumption was inappropriate. Yet, McClelland (1991) later showed that the inclusion of a stochastic component in the network changed the system's behavior, in a way that was more compatible with empirical observations. Thus here also, two assumptions (stochasticity and interactivity) may have interdependent consequences.

In sum, given the interdependency of various assumptions, modeling projects provide the most appropriate testing ground for the general principles that they instantiate. Yet designers should pay attention not only to the descriptive adequacy of their models but also to the relation between their models and general principles.

4.5.

The gardener's problem: from simulation to theory

One conceptual diculty that sometimes a!icts discussions of the role of modeling techniques is the conation between the computer program and the theory. Some authors have gone as far as claiming that \Theories can be stated as computer programs" (Simon, 1992, p 152). In contrast, we consider that it is crucial to insist on the distinction and complementarity between the simulation system and the accompanying theoretical gloss. Computer simulations complement rather than replace verbal descriptions. A clear statement of this complementarity appeared in Palmer & Kimchi (1986), who argue against the notion that the computer program as such constitutes a psychological theory, and insist on the importance of the accompanying description:

a running simulation is only an IP information processing] theory by virtue of the fact that it too can be described by a ow diagram plus mini-mapping theories of its components (p. 57).

Their major argument is that a computer program can be described at various levels of speci cation, and that it may be dicult, without a verbal account, to decide which levels of description are psychologically relevant. This is the problem of mapping hypothetical constructs in the model onto their psychological counter- parts. There is also, however, a related but distinct diculty, which we call the redescription problem. Modelers must specify the properties and characteristics

(23)

underlying the model's functioning at a level of abstractness that permits useful and appropriate generalizations.

4.5.1. The mapping problem

The rst point may seem obvious. A model is a metaphor, and a metaphor is illumi- nating only as far as one clari es the relevant features that the metaphorical object shares with the target system, or better, the relevant level(s) of analysis at which a correspondence may be established between the two systems. Yet, in practice, expliciting and understanding the relationship between a simulation model and the corresponding human process is far from trivial. A major cause of this diculty is that both human cognitive processes and computer programs are complex objects that allow for a multiplicity of levels of description.

One well-known reference on the issue of description levels is a well-known pro- posal by David Marr (1982) that identi es three levels of analysis of information processing tasks. The three levels correspond to the computational description of the system (the input-output mapping that the system realizes), its algorithmic description (the algorithm used to perform the mapping) and its hardware imple- mentation. Marr's discussion makes it clear that all three levels may contribute to the understanding of the observed phenomena: some being explicable through hardware properties (afterimages), others (the Necker cube) requiring consideration of both hardware properties and algorithmic description. Furthermore, the notion of algorithmic description masks the fact (known to everyone who has engaged in any sort of computer programming project) that an algorithm can be described with various grains, independently of the hardware speci cations (cf. Palmer and Kinchi's notion of recursivedecomposition).

Given the multiplicityof potential algorithmicdescriptions, a simulationmodel at the algorithmic level could in principle be constructed to match the real function at many dierent levels, from the most abstract level of the input-output mapping (as happens, for instance, if a regression technique was used to derive a mathematical function), to the nest-grained level of elementary processes, with all intermedi- ate possibilities (such as, for instance, in Massaro's, 1989a, Fuzzy Logical Model of Perception, which assumes three stages of perceptual processing|evaluation of per- ceptual features, integration and decision|but restrict the simulationto an abstract mathematical description of the integration and decision operations). Concerning evaluation, it seems obvious that a (hypothetical) simulationmodel in which the cor- respondence goes down to the most elementary level is better, in scope and power, than a model restricted to the most abstract level of mapping. Nevertheless, this does not mean that starting at the most detailed level is the best research strategy.

As Marr suggested, it may be easier to start from a broad abstract characterization of the function, and gradually focus the microscope.

These issues pertain not only to symbolic approaches to modeling, but also to the arti cial neural networks framework. Willshaw (1995) describes a formal technique through which sets of symbolic and subsymbolic algorithms may be organized hier- archically in terms of their level of abstraction and implementation, and concludes

(24)

that \symbolic and subsymbolic algorithms are not neatly divided into two distinct classes, with the one being at a 'higher' level than the other" (p. 16).

4.5.2. The description problem

The problem of redescription|extracting an appropriate description of the model functioning from simulation results and knowledge of its design to allow useful generalizations|may appear more acute if one adopts the gardener's approach, though in no way would we argue that it is speci c to that strategy. As we have repeatedly stated, any reasonably complex model may at some point produce un- expected behavior. Indeed, our recent results with trace illustrate one case in which the behavior of the system did not correspond to the description given by its designers. It is the job of the designers (or, for that matter, of any serious user of the model) to explore the details of the system performance, the way it changes with variations of the stimulus set, or parameter values, and to provide principled and accurate accounts of how and why the system behaves the way it does.

The gardener's approach may, with much know-how and perhaps a bit of luck, lead to an outcome that matches the empirical observations. Still, that is only the beginning of the hard work. Simulations are not explanations. If we do not understand the simulationprocess any more than we understand the real one, having a running simulation of a given function is of little help. To borrow from a judicious analogy introduced by Forster (1994), this would be no more helpful than having a next-door neighbor capable of predicting, without explaining how, the outcome of any experiment that we might design and run. To some extent, the problem is similar to the use of statistical data- tting techniques: A mathematical equation may provide a descriptively and predictively adequate account of some regularity, but not an explicit description of the process that produces the regularity itself, and this strongly restricts possible generalizations.

This issue has arisen in recent years in the context of the assessment of the dis- tributed arti cial neural networks framework, and the discusion has centered on Seidenberg and McClelland's (1989) model of visual word recognition and naming, and its more recent derivatives (Plaut & McClelland, 1993 Plaut et al., 1996).

Note that the issue is not whether any of these models is empirically adequate, but rather whether they provide or even lead to adequate theories of cognitive functions.

McCloskey (1991) argued that the theoretical claims formulated by Seidenberg and McClelland are vague and too general, and that the theoretical elaboration fails to describe how the network accomplishes its task, because of our limited under- standing of complex connectionist networks. Yet, such a description of processing is certainly no less appropriate or informative than any other type of model cur- rently available. As noted by Seidenberg (1993), \there is a rich theory here: it has only to be acknowledged" (p.233). Granted, the description leaves many details unspeci ed, it may be incomplete, the mechanics of the model is based on new and unfamiliar notions, it is implausible in some respects, and many aspects of its performance could be further explored. However, similar remarks could be made about any other modeling eort.

Références

Documents relatifs

Therefore, this study aimed to investigate the relation between students’ epistemological understanding of models and modelling and their cognitive processing (i.e. deep versus

To determine the suitability of a software development method- ology (SDM) to aid in the development of location-based games, it is nec- essary to determine to what degree SDMs

The application implements the k-means algorithm, DBSCAN algorithm, agglomerative and spectral clustering, assessed the quality of clustering by the indicated algorithms,

While it is natural for us to talk about cumulated gain over time, the traditional cumulated gain measures have substi- tuted document rank for time and implicitly model a user

However, by my lights, Sankey is mistaken that his most recently proposed view avoids Grzankowski’s second objection, since believing that p is true isn’t logically distinct

Julien Bect, Nicolas Bousquet, Bertrand Iooss, Shijie Liu, Alice Mabille, Anne-Laure Popelin, Thibault Rivière, Rémi Stroh, Roman Sueur,

More precisely, the compositional account of productivity, that is, the generation of the intended reading of sentences exclusively by formal operations of lexically encoded

The proposed method is developed based on the hypothesis that A single user follows a specific linguistic style while communicating with another person.. Thus, at first, our aim is