• Aucun résultat trouvé

Informational Encapsulation of the Visuo-Motor System 51

2.2 Modularity

2.2.1 Informational Encapsulation of the Visuo-Motor System 51

2.3 Visual Properties. . . 60 2.3.1 Properties Represented in Early Vision . . . 64 2.4 Hypothesis Adopted . . . 68

Perceptual systems (e.g., visual and auditory) are typically regarded as

composed of central and peripheral subsystems, and having early and a late stages of processing.

Cognitive penetrability of perception, broadly understood, refers to the influence that the cognitive system has on perceptual systems. Penetrating states could be knowledge, beliefs, intentions, expectations, but also desires, moods, feelings, and so on.1

In order to spell out the cognitive penetration of perception it is neces-sary to explain how we should understand perceptual processing, in order to determine which subsystem is cognitively influenced and at which level of processing, to wit, early or late vision. In this thesis, I will focus my at-tention principally on the visual system (understood as the conjunction of early and late vision, and visuo-motor mechanisms). There are several rea-sons why I decide to consider mainly the visual system in this thesis. First, vision is a very well studied field which provides very rich empirical data essential for the debate on cognitive penetration. Second, the arguments on this thesis upholding cognitive penetration will be directly based on recent neuroscientific evidence and, as far as I know, empirical studies on other sense modalities are not as detailed as they are on vision. The amount and variety of studies on vision together with their highly detailed data repre-sent an esrepre-sential instrument for this thesis. Finally, in this chapter I will meticulously scrutinize vision; a similar analysis of another sense modality would have increased the length of this thesis unnecessarily. Nonetheless, although this dissertation is mainly focused on cognitive penetration of the visual system, I will make reference as much as possible along the next chapters to empirical studies susceptible of showing cognitive penetration of other senses. Furthermore, I am convinced that the consequences of my research — to wit, that the entire visual system is cognitively penetrated — can be extended to other sense modalities (e.g., auditory, tactile, but also

1This definition of cognitive penetrability is broadly construed, it refers not only to cognitive states but also to affective states such as emotions, moods, and feelings (see, e.g.,Lyons 2011, 290;Siegel 2012, 201;Siegel 2013c, 698, 719-720;Siegel and Silins 2014, 30;Macpherson 2012, 27;Stokes 2012.)

Three main aspects need to be scrutinized about the perceptual system in order to understand the debate on cognitive penetration (see, for instance, Macpherson 2012, 29-33):

1. How to characterize the visual system? Is visual perception an early or a late level of visual processing? Is the visual processing distin-guishable from cognition?

2. How does the visual system process information? Is the visual pro-cessing of information isolated from other systems, and thus informa-tionally encapsulated? Or does it necessitate the intervention of other systems (perceptual, motor or cognitive) in order to process visual information?

3. Which properties are represented by the visual system? Does percep-tion represent low-level properties such as colour, shape, size, bright-ness, and the like, or it also represents high-level properties such as pine trees, tables, faces, and so on?

In order to answer that first group of questions in section 2.1 I will analyse the nature of the visual and the cognitive systems, their levels of processing and interactions. I will characterise the visual system as a visuo-motor system composed of three subsystems: two central, early and late vision, and one peripheral, motor. I will also analyse cognition (2.1.3).

Next, I will focus my attention on modularity, and argue that at some level of processing the visual system is informationally encapsulated (section 2.2). This will respond to the second interrogation. Finally, I will scrutinize which kinds of properties are processed and represented in visual perception (section 2.3). At the end of the chapter (section 2.4), I will spell out the working hypothesis that I will adopt along this thesis. Let’s begin with the analysis of the visual system and stages of processing.

2Having clarified this, hereafter, when I discuss cognitive penetration I refer to cog-nitive penetration of visual perception.

2.1 Visual and Cognitive Systems

Vision or visual perception is the process which begins with the compu-tation of a physical stimulus on the retina and finishes with the generation of a perceptual experience. This process could be essentially separated in two levels of processing: early vision and late vision (Cavanagh 2011;Fodor 1983, 2000; Hayward and Tarr 2005; Hildreth 1987; Hildreth and Ullman 1989;Marr 1982;Pylyshyn 1984, 1999a, 2003;Raftopoulos 2001c, a,2009a;

Rensink 2000a,b; Ullman 1996; Wagemans et al. 2005).

Hildreth and Ullman (1989) distinguish between three levels of visual processing: low-, intermediate-, and high-level.3 The low-level stage of the processing is characterized by a bottom-up mode of information treatment.4 At this stage of processing visual information occurs in parallel — i.e., visual operations happen across the entire visual field — and the properties pro-cessed are, e.g., edge detection, binocular stereopsis, depth, shade, colour, texture, and movement. (Hildreth and Ullman 1989, 583, 610-611)

The intermediate visual level computes shape identification (geons)5, object manipulation, locomotion, and navigation through the environment, and includes processes extracting shape properties and spatial relations.

These processes cannot be entirely stimulus-dependent and necessitate the interaction of lower and higher visual areas. An object could be described in different ways, some of these descriptions can be more useful than oth-ers. The same visual scene will be processed differently if in one case the task concerns objects’ size (shape properties) and in another circumstance distance between objects (spatial relations). The process depends on use-fulness rather than on validity (Hildreth and Ullman 1989, 610). Then, the relevant representation of the visual scene at the intermediate-level vi-sion cannot be entirely stimulus-driven, but is instead task dependent or

3For a similar and contemporary explanation of visual processing seeCavanagh(2011).

4Bottom-up in this particular case refers to the sort of processing which only depends on the visual stimulus, and is independent of the task to be performed.

5Geon representations are simple 2- or 3-dimensional forms such as edges, lines, circles, rectangles, triangles, or cylinders, cubes, wedges and cones corresponding to the simple parts of an object.

top-down visually dependent (Hildreth and Ullman 1989, 610-615).6

High-level vision identifies and categorizes objects in the visual world.

The process necessitates of top-down cognitive influences to match the rep-resentation generated by the visual system with internal reprep-resentations of objects stored in long-term memory (Hildreth and Ullman 1989, 610-620).

The contrast between low- and intermediate-level vision, on the one hand, and high-level vision, on the other hand, is that, while the former are not cognitively influenced (e.g., by memory), the latter intimately relies on it:

“[t]he process of object recognition is therefore different from processes of intermediate and low-level vision in that it is more intimately related to the problems of memory organization, retrieval, expectations, and reasoning”

(Hildreth and Ullman 1989, 610, 616-617). Thus, high-level vision involves cognitive influences.

Hildreth’s and Ullman’s (1989) low- and intermediate-level vision ap-proximatively corresponds to the definition of the visual system given by Marr(1982) — what he calls early vision. Marr decomposes and character-izes early vision in three stages of processing with different degrees of ab-straction that combined reconstruct the visual scene (Marr(1982, 136-138);

see alsoEngel(1996, 225-226) for a presentation): a primal sketch which rep-resents visual information about the two-dimensional image, mainly changes in intensity and geometrical organization; a 21/2-D sketch which describes the orientation and depth of surfaces, and detects contours and discontinu-ities from the viewer-centred view point; and a 3-D sketch which depicts the spatial organization of volumetric characteristics from the object-centred perspective. In Marr’s caracterization, Hildreth’s and Ullman’s (1989) high-level vision corresponds to a form of late vision.

To sum up, we can distinguish between two levels of visual processing.

Early vision which strictly depends on the interaction between bottom-up incoming visual information and processes happening within the visual

6Top-down influences at this level come from the visual system itself, i.e., they are vi-sual top-down influences or, as some authors define them ‘lateral or horizontal influences’

(Cavanagh 2011, 1538;Pylyshyn 2003, 68;Raftopoulos 2009a, 51, 274).

system. As a result this process delivers geon representations. And late vision, a stage in which the visual process governing object recognition and categorization depends on top-down cognitive influences.

In what follows I will spell out early and late vision.

2.1.1 Early Vision

The visual system is defined by some scientists (Pylyshyn 1980, 1984, 1999a, 2003) and philosophers (Fodor 1983,1988,2000;Raftopoulos 2001c, a, 2009a, 2011, 2014a) as early vision. Accordingly, even though visual perception refers to the processing of information going from the retina up to the generation of a perceptual experience, what they call ‘visual sys-tem’ is restricted to early vision in the terms of Marr’s (1982) definition

— i.e., low- and intermediate-level vision in Hildreth and Ullman (1989).

Pylyshyn(2003, 50) claims “within the broad category of what we call ‘vi-sion’ is a highly complex information processing system, which some have called ‘early vision,’ that functions independently of what we believe” (see also Pylyshyn 1999a, 342). Raftopoulos (2009a, 51) labels it simply as

“perception”.7

The early visual system is considered as isolated from cognition: the computation of the visual input necessary to produce an output is indepen-dent from mnemonic influences (Fodor 1983, 64, 70-71; Hildreth and Ullman 1989, 610, 616-617; Pylyshyn 1999a, 361; 2003, 134-136). Pylyshyn (2003, 136) argues that “the early-vision system could encode any property whose identification does not require accessing general memory”. (I analyse this claim in section2.2.)

The subsequent process is cognition which is composed of late vision and cognition properly speaking.8 Higher visual and non-visual processes

7Hereafter the term ‘early vision’ and ‘early visual system’ will be used as synonyms.

8Late vision is the stage of visual processing which interacts with memory; a process necessary to recognize and categorize objects. Cognitionproperly speaking refers to the stage in which the cognitive system can perform all its operations independently from perceptual processes. This is a purely cognitive stage (this includes beliefs, desires, feelings, and so on). See section2.1.2.

which require to access memory belong to “cognition” (Pylyshyn 1999a, 344) or “observation” (Fodor 1984; Raftopoulos 2001c, 188; Raftopoulos 2009a, 51). Processes which depend on the subject’s cognitive background are, by definition, cognitive (Pylyshyn 1984, 134-135; Raftopoulos 2001a, 427; 2009a, 77, 80).

Early vision is functionally defined. Pylyshyn (1999a, 342 and fn. 2) characterizes the visual system by the sort of functional (psychophysical) properties it computes rather than by its neurophysiology. According to Pylyshyn (1999a, 344), a neuroanatomical definition of early vision is diffi-cult to offer because “[t]he neuroanatomical locus of early vision [...] is not known with any precision”. On the one hand, not every stimulation on the retina is processed by the visual system, some of this information is encoded for other systems.9 On the other hand, visual inputs cannot be restricted to sensory stimulation on the retina; the visual system also treats information from other modalities.10 (Pylyshyn 1984, 172; 1999a, 361; 2003, 125-133)

Although Pylyshyn tries to avoid a neuroanatomical definition of the visual system, the need to differentiate the visual system from cognitive factors (such as long-term memory) compels the psychologist to provide a minimal neuroanatomical description of the visual system. He claims that the visual system itself is “roughly identified with the visual cortex, as mapped out, say, by Felleman and Van Essen 1991” (Pylyshyn 1999a, 347;

2003, 67-68).

9For example, a few retinal projections culminate in the superior colliculus (SC) re-sponsible for head and eye movements, others project to the pretectum which regulates pupillary light reflex, whereas others end up in the optic tract controlling circadian rhythms. See Goodale and Milner (2005, 312); Purves et al. (2004, 263); Pylyshyn (1984, 172) andTov´ee(2008, 74).

10Interaction with other systems occurs from the vestibular system which seems to affect perception of orientation; it also happens in cross-modal visual processing from proprioceptive signals from the head and neck muscles which influence visual location (Pylyshyn 1999a, 361; Pylyshyn 2003, 125); from audition, e.g., McGurk-MacDonald effect (McGurk and MacDonald(1976); see section2.2.1for an explanation) (Pylyshyn 2003, 125-127); from many other senses such as in synesthesia (Pylyshyn 2003, 127-130);

and from the motor system (Pylyshyn 2003, 130-133). Pylyshyn denies that cross-modal and motor influences represent a form of cognitive intervention in early visual operations;

he claims instead that the systems only interact (Pylyshyn 2003, 127).

Felleman and Van Essen (1991) and Felleman et al. (1997) give what nowadays can be considered as a ‘standard’ neurophysiological definition of the visuo-motor system. The definition includes the visual cortex, cen-tral and posterior parietal areas, and the inferior temporal cortex. This description basically equates to Milner and Goodale’s (1995; 2006) dorsal and ventral streams which process respectively information for action and for object recognition. While both start in the primary visual cortex, the former projects to posterior-parietal areas and the latter culminates in the inferotemporal cortex. In addition, Felleman and Van Essen’s (1991) de-scription of visual processing is given in terms of visuo-motor mechanisms, which means that the visual system is not only responsible for the process-ing of visual stimuli on the retina, but also for the allocation of attention, eye movements, and other motor functions. (See Fodor (1983, 66-67) and Wu(2013, 19-20) for such a claim.) Therefore, frontal cortical areas respon-sible for eye guidance and movement — i.e., the frontal eye field (FEF) and the dorsolateral prefrontal cortex (DLPFC) —, and subcortical structures responsible for allocation of visual attention and eye movements — i.e., part of the superior colliculus (SC) — belong to the visuo-motor system as well. Eventually, Pylyshyn seems to provide a neuroanatomico-functional definition of the visuo-motor system.

To sum up, the early visual system can be neuroanatomico-functionally characterized. It includes the dorsal and ventral streams (respectively in-tended for action and object recognition) as well as subcortical brain areas governing visuo-motor functions. Thus, the visual system is a visuo-motor system.11

Early vision, as defined by Fodor(1983, 2000); Pylyshyn (1984, 1999a, 2003) and Raftopoulos (2001c, a, 2005a, 2009a, 2011), is an early stage of processing isolated from other systems. Although the computation of visual properties begins in early vision, the perceptual processing goes on and be-comes rapidly affected by cognition. Signals coming from higher cognitive cortical areas (such as the prefrontal cortex (PFC) responsible for

exec-11Hereafter the terms ‘visual system’ and ‘visuo-moto system’ will be used as synonyms.

utive functions) and mnemonic cortical regions (e.g., non-visual temporal areas) influence the visual cortex shortly after the stimulus presentation.

According to Raftopoulos (2001c; 2009a, ch. 2), early visual perception is a process which lasts about 100 or 120 milliseconds post-stimulus onset.

During this time window reentrant pathways do not affect early vision: the system seems to be isolated from other perceptual systems and cognition.

Typically, the primary visual cortex (the earliest cortical area respon-sible for visual processing, also called area V1 or striate cortex) becomes activated 40 ms after stimulus presentation. Then, following Raftopoulos’

interpretation, during the next 60 ms of visual processing the visual sys-tem is unaffected by external (perceptual, motor or cognitive) influences.12 After this interval, cognition penetrates vision. The visual process recruits non-visual areas and signals from higher brain centres pervade the visual cortex. The beginning of the interaction between the visual and the cogni-tive systems indicates the beginning of the late visual processing.

To summarize, early vision is the stage of the visual processing which goes up to 100-120 ms after stimulus onset. During this time period, the early processing is considered as encapsulated and impenetrable by other systems and mainly by cognition. After this interval, cognition penetrates vision and signals from higher cortical areas pervade the system influencing the visual processing. This is the beginning of late vision.

Early vision processes information bottom-up retrieved from the retinal image. The system computes properties such as colour, shape, position, orientation, size, texture, luminance, motion (Raftopoulos 2009a, 51)13 but also three-dimensional objects (Pylyshyn 1999a, 343; 2003, 51, 95-106, 143, 146-147)14 (See section 2.3 for a further discussion.)

12There are cases in which cognitive influences modulate the visual system in the time window of early vision (Pylyshyn 1999a, 359-360; Raftopoulos2001a, 443-444;2005a, 76;

2009a, 303-304) but according toRaftopoulos(2009a, 79-88) this influence only facilitates the visual processing without modifying it. See chapter6 for a discussion and criticism of his argument.

13This corresponds to the 21/2-D model inMarr(1982).

14This involves the 3-dimensional model of objects in Marr (1982). For a detailed explanation of the visual processing, properties encoded, and brain areas involved see Stirling (2000, 94-103); Purves et al. (2004, ch. 11); Valberg (2005, 383-401); Tov´ee

Following Rensink’s (2000a; 2000b) triadic architecture, Raftopoulos (2009a, 21-28) claims that visual perception depends on three stages of vi-sual processing: a non-attentive and low-level process which delivers prim-itive visual outputs called proto-objects, and two higher visual stages — an attentional level responsible for picking out some proto-object to form coherent and individuated objects (i.e., perceptual experiences), and a non-attentional stage which detects the scene layout and gist and guides the allocation of attention. The former refers to early vision, the latter equate to late vision. (See Rensink (2000a, 35; 2000b, 1476) for a detailed expla-nation of the triadic architecture.) Raftopoulos argues:

Attention has access only to proto-objects, the output of the [early]

processing stage, and does not modulate processing in the first [stage]. Thus, proto-objects are both the lowest-level operands upon which selective attention can act and the highest-level outputs of low-level vision (the preattentive parallel bottom-up stage of visual processing). Focused attention provides structures that are coherent over an extended region of space and time, and thus it is inextricably involved in object perception—that is, in the perception of objects as they are experienced through our senses. Raftopoulos(2009a, 22) According to this view early visual outputs are primitive representations of objects expressible in the vocabulary of geometry (Pylyshyn 2003, 133-134), or proto-objects — complex and volatile structures (lasting for less than half a second) which are constantly regenerated and replaced by any stimuli appearing at its same retinal location (Raftopoulos 2009a, 21-22, 76-79;Rensink 2000a, 20-24; 2000b, 1473-1476).15

(2008, ch. 4 and 5) and Bear et al.(2007, ch. 10).

15About the concept of proto-objectRaftopoulos(2009a, 28) claims “Pylyshyn(2001) thinks of proto-objects as viewer-centered structural descriptions of objects”. In addition, Pylyshyn explains: “[t]he concept of a ‘proto-object’ is a general one that has been used by a number of writers (sometimes using the same term,Di Lollo et al. 2000;Rensink 2000a, and sometimes using some other term, such as ‘preattentive object’, Wolfe and Bennett 1997) in reference to clusters of proximal features that serve as precursors in the detection of real physical objects. What these uses have in common is that they refer to something more than a localized property or ‘feature’ and less than a recognized 3D

Although early vision is capable of delivering geon representations it does not (normally) process more sophisticated properties such as cats or chairs (see section2.3). The representations delivered by early vision are sufficient for cognition to recognize and categorise the object (Pylyshyn 1999a, 361).

The visual scene presents more information than the subject needs and her system is able to process. How an object is interpreted and represented depends on the subject’s background knowledge and her immediate pur-poses (Pylyshyn 2003, 157; Raftopoulos 2009a, 77; Rensink 2000a, 28). At the late visual level, and following the subject’s purposes, attentional se-lective mechanisms pick out some proto-objects to form the relevant stable visual objects which match with categories stored in memory (chairs, tigers, plants). The representations composed of proto-objects are the visual states we are aware of, i.e., perceptual experiences.16 (Raftopoulos 2009a, 21-28, Rensink 2000a, b)17

Early vision is neither capable of identifying particular individual tokens as ‘my sister’, ‘his car’, and the like (Pylyshyn 1999b, 408). The identifi-cation and recognition of these object tokens depend on further beliefs and knowledge stored in memory. For someone to identify a person as ‘her sis-ter’, her visual system needs to compute the visual representation of the person together with information about this person (Cavanagh 2011, 1546-1548; Hildreth and Ullman 1989, 616-620; Pylyshyn 2003, 134-135).

The outputs of early vision are not necessarily perceptual experiences.

Perceptual experiences are, by definition, conscious mental states with a phenomenal character. The phenomenal character of an experience is what

Perceptual experiences are, by definition, conscious mental states with a phenomenal character. The phenomenal character of an experience is what