• Aucun résultat trouvé

1.3 Structure of the Thesis

2.1.1 Early Vision

The visual system is defined by some scientists (Pylyshyn 1980, 1984, 1999a, 2003) and philosophers (Fodor 1983,1988,2000;Raftopoulos 2001c, a, 2009a, 2011, 2014a) as early vision. Accordingly, even though visual perception refers to the processing of information going from the retina up to the generation of a perceptual experience, what they call ‘visual sys-tem’ is restricted to early vision in the terms of Marr’s (1982) definition

— i.e., low- and intermediate-level vision in Hildreth and Ullman (1989).

Pylyshyn(2003, 50) claims “within the broad category of what we call ‘vi-sion’ is a highly complex information processing system, which some have called ‘early vision,’ that functions independently of what we believe” (see also Pylyshyn 1999a, 342). Raftopoulos (2009a, 51) labels it simply as

“perception”.7

The early visual system is considered as isolated from cognition: the computation of the visual input necessary to produce an output is indepen-dent from mnemonic influences (Fodor 1983, 64, 70-71; Hildreth and Ullman 1989, 610, 616-617; Pylyshyn 1999a, 361; 2003, 134-136). Pylyshyn (2003, 136) argues that “the early-vision system could encode any property whose identification does not require accessing general memory”. (I analyse this claim in section2.2.)

The subsequent process is cognition which is composed of late vision and cognition properly speaking.8 Higher visual and non-visual processes

7Hereafter the term ‘early vision’ and ‘early visual system’ will be used as synonyms.

8Late vision is the stage of visual processing which interacts with memory; a process necessary to recognize and categorize objects. Cognitionproperly speaking refers to the stage in which the cognitive system can perform all its operations independently from perceptual processes. This is a purely cognitive stage (this includes beliefs, desires, feelings, and so on). See section2.1.2.

which require to access memory belong to “cognition” (Pylyshyn 1999a, 344) or “observation” (Fodor 1984; Raftopoulos 2001c, 188; Raftopoulos 2009a, 51). Processes which depend on the subject’s cognitive background are, by definition, cognitive (Pylyshyn 1984, 134-135; Raftopoulos 2001a, 427; 2009a, 77, 80).

Early vision is functionally defined. Pylyshyn (1999a, 342 and fn. 2) characterizes the visual system by the sort of functional (psychophysical) properties it computes rather than by its neurophysiology. According to Pylyshyn (1999a, 344), a neuroanatomical definition of early vision is diffi-cult to offer because “[t]he neuroanatomical locus of early vision [...] is not known with any precision”. On the one hand, not every stimulation on the retina is processed by the visual system, some of this information is encoded for other systems.9 On the other hand, visual inputs cannot be restricted to sensory stimulation on the retina; the visual system also treats information from other modalities.10 (Pylyshyn 1984, 172; 1999a, 361; 2003, 125-133)

Although Pylyshyn tries to avoid a neuroanatomical definition of the visual system, the need to differentiate the visual system from cognitive factors (such as long-term memory) compels the psychologist to provide a minimal neuroanatomical description of the visual system. He claims that the visual system itself is “roughly identified with the visual cortex, as mapped out, say, by Felleman and Van Essen 1991” (Pylyshyn 1999a, 347;

2003, 67-68).

9For example, a few retinal projections culminate in the superior colliculus (SC) re-sponsible for head and eye movements, others project to the pretectum which regulates pupillary light reflex, whereas others end up in the optic tract controlling circadian rhythms. See Goodale and Milner (2005, 312); Purves et al. (2004, 263); Pylyshyn (1984, 172) andTov´ee(2008, 74).

10Interaction with other systems occurs from the vestibular system which seems to affect perception of orientation; it also happens in cross-modal visual processing from proprioceptive signals from the head and neck muscles which influence visual location (Pylyshyn 1999a, 361; Pylyshyn 2003, 125); from audition, e.g., McGurk-MacDonald effect (McGurk and MacDonald(1976); see section2.2.1for an explanation) (Pylyshyn 2003, 125-127); from many other senses such as in synesthesia (Pylyshyn 2003, 127-130);

and from the motor system (Pylyshyn 2003, 130-133). Pylyshyn denies that cross-modal and motor influences represent a form of cognitive intervention in early visual operations;

he claims instead that the systems only interact (Pylyshyn 2003, 127).

Felleman and Van Essen (1991) and Felleman et al. (1997) give what nowadays can be considered as a ‘standard’ neurophysiological definition of the visuo-motor system. The definition includes the visual cortex, cen-tral and posterior parietal areas, and the inferior temporal cortex. This description basically equates to Milner and Goodale’s (1995; 2006) dorsal and ventral streams which process respectively information for action and for object recognition. While both start in the primary visual cortex, the former projects to posterior-parietal areas and the latter culminates in the inferotemporal cortex. In addition, Felleman and Van Essen’s (1991) de-scription of visual processing is given in terms of visuo-motor mechanisms, which means that the visual system is not only responsible for the process-ing of visual stimuli on the retina, but also for the allocation of attention, eye movements, and other motor functions. (See Fodor (1983, 66-67) and Wu(2013, 19-20) for such a claim.) Therefore, frontal cortical areas respon-sible for eye guidance and movement — i.e., the frontal eye field (FEF) and the dorsolateral prefrontal cortex (DLPFC) —, and subcortical structures responsible for allocation of visual attention and eye movements — i.e., part of the superior colliculus (SC) — belong to the visuo-motor system as well. Eventually, Pylyshyn seems to provide a neuroanatomico-functional definition of the visuo-motor system.

To sum up, the early visual system can be neuroanatomico-functionally characterized. It includes the dorsal and ventral streams (respectively in-tended for action and object recognition) as well as subcortical brain areas governing visuo-motor functions. Thus, the visual system is a visuo-motor system.11

Early vision, as defined by Fodor(1983, 2000); Pylyshyn (1984, 1999a, 2003) and Raftopoulos (2001c, a, 2005a, 2009a, 2011), is an early stage of processing isolated from other systems. Although the computation of visual properties begins in early vision, the perceptual processing goes on and be-comes rapidly affected by cognition. Signals coming from higher cognitive cortical areas (such as the prefrontal cortex (PFC) responsible for

exec-11Hereafter the terms ‘visual system’ and ‘visuo-moto system’ will be used as synonyms.

utive functions) and mnemonic cortical regions (e.g., non-visual temporal areas) influence the visual cortex shortly after the stimulus presentation.

According to Raftopoulos (2001c; 2009a, ch. 2), early visual perception is a process which lasts about 100 or 120 milliseconds post-stimulus onset.

During this time window reentrant pathways do not affect early vision: the system seems to be isolated from other perceptual systems and cognition.

Typically, the primary visual cortex (the earliest cortical area respon-sible for visual processing, also called area V1 or striate cortex) becomes activated 40 ms after stimulus presentation. Then, following Raftopoulos’

interpretation, during the next 60 ms of visual processing the visual sys-tem is unaffected by external (perceptual, motor or cognitive) influences.12 After this interval, cognition penetrates vision. The visual process recruits non-visual areas and signals from higher brain centres pervade the visual cortex. The beginning of the interaction between the visual and the cogni-tive systems indicates the beginning of the late visual processing.

To summarize, early vision is the stage of the visual processing which goes up to 100-120 ms after stimulus onset. During this time period, the early processing is considered as encapsulated and impenetrable by other systems and mainly by cognition. After this interval, cognition penetrates vision and signals from higher cortical areas pervade the system influencing the visual processing. This is the beginning of late vision.

Early vision processes information bottom-up retrieved from the retinal image. The system computes properties such as colour, shape, position, orientation, size, texture, luminance, motion (Raftopoulos 2009a, 51)13 but also three-dimensional objects (Pylyshyn 1999a, 343; 2003, 51, 95-106, 143, 146-147)14 (See section 2.3 for a further discussion.)

12There are cases in which cognitive influences modulate the visual system in the time window of early vision (Pylyshyn 1999a, 359-360; Raftopoulos2001a, 443-444;2005a, 76;

2009a, 303-304) but according toRaftopoulos(2009a, 79-88) this influence only facilitates the visual processing without modifying it. See chapter6 for a discussion and criticism of his argument.

13This corresponds to the 21/2-D model inMarr(1982).

14This involves the 3-dimensional model of objects in Marr (1982). For a detailed explanation of the visual processing, properties encoded, and brain areas involved see Stirling (2000, 94-103); Purves et al. (2004, ch. 11); Valberg (2005, 383-401); Tov´ee

Following Rensink’s (2000a; 2000b) triadic architecture, Raftopoulos (2009a, 21-28) claims that visual perception depends on three stages of vi-sual processing: a non-attentive and low-level process which delivers prim-itive visual outputs called proto-objects, and two higher visual stages — an attentional level responsible for picking out some proto-object to form coherent and individuated objects (i.e., perceptual experiences), and a non-attentional stage which detects the scene layout and gist and guides the allocation of attention. The former refers to early vision, the latter equate to late vision. (See Rensink (2000a, 35; 2000b, 1476) for a detailed expla-nation of the triadic architecture.) Raftopoulos argues:

Attention has access only to proto-objects, the output of the [early]

processing stage, and does not modulate processing in the first [stage]. Thus, proto-objects are both the lowest-level operands upon which selective attention can act and the highest-level outputs of low-level vision (the preattentive parallel bottom-up stage of visual processing). Focused attention provides structures that are coherent over an extended region of space and time, and thus it is inextricably involved in object perception—that is, in the perception of objects as they are experienced through our senses. Raftopoulos(2009a, 22) According to this view early visual outputs are primitive representations of objects expressible in the vocabulary of geometry (Pylyshyn 2003, 133-134), or proto-objects — complex and volatile structures (lasting for less than half a second) which are constantly regenerated and replaced by any stimuli appearing at its same retinal location (Raftopoulos 2009a, 21-22, 76-79;Rensink 2000a, 20-24; 2000b, 1473-1476).15

(2008, ch. 4 and 5) and Bear et al.(2007, ch. 10).

15About the concept of proto-objectRaftopoulos(2009a, 28) claims “Pylyshyn(2001) thinks of proto-objects as viewer-centered structural descriptions of objects”. In addition, Pylyshyn explains: “[t]he concept of a ‘proto-object’ is a general one that has been used by a number of writers (sometimes using the same term,Di Lollo et al. 2000;Rensink 2000a, and sometimes using some other term, such as ‘preattentive object’, Wolfe and Bennett 1997) in reference to clusters of proximal features that serve as precursors in the detection of real physical objects. What these uses have in common is that they refer to something more than a localized property or ‘feature’ and less than a recognized 3D

Although early vision is capable of delivering geon representations it does not (normally) process more sophisticated properties such as cats or chairs (see section2.3). The representations delivered by early vision are sufficient for cognition to recognize and categorise the object (Pylyshyn 1999a, 361).

The visual scene presents more information than the subject needs and her system is able to process. How an object is interpreted and represented depends on the subject’s background knowledge and her immediate pur-poses (Pylyshyn 2003, 157; Raftopoulos 2009a, 77; Rensink 2000a, 28). At the late visual level, and following the subject’s purposes, attentional se-lective mechanisms pick out some proto-objects to form the relevant stable visual objects which match with categories stored in memory (chairs, tigers, plants). The representations composed of proto-objects are the visual states we are aware of, i.e., perceptual experiences.16 (Raftopoulos 2009a, 21-28, Rensink 2000a, b)17

Early vision is neither capable of identifying particular individual tokens as ‘my sister’, ‘his car’, and the like (Pylyshyn 1999b, 408). The identifi-cation and recognition of these object tokens depend on further beliefs and knowledge stored in memory. For someone to identify a person as ‘her sis-ter’, her visual system needs to compute the visual representation of the person together with information about this person (Cavanagh 2011, 1546-1548; Hildreth and Ullman 1989, 616-620; Pylyshyn 2003, 134-135).

The outputs of early vision are not necessarily perceptual experiences.

Perceptual experiences are, by definition, conscious mental states with a phenomenal character. The phenomenal character of an experience is what

distal object. Beyond that, the exact nature of a proto-object depends on the theory in question.” (Pylyshyn 2001, 144, fn. 5)

16Rensink (2000a) calls perceptual experiences (stable and detailed representation which changes coherently according to subject’s allocation of attention) “virtual rep-resentations”. See alsoRaftopoulos 2009a, 25.

17To some extent, Pylyshyn seems to share the view that perceptual experiences are a compound of proto-objects. He writes: “[a]fter early vision, cognition is ableto select the most likely interpretation from those provided by early vision, and is able to further interpret the information that vision provides in order to establish beliefs about what has been seen” (Pylyshyn 2003, 157; my italics). According to Rensink, Pylyshyn’s visual objects are clusters of proto-objects (in Rensink’s sense).

it is like for the subject to have this experience. For instance, there is a difference between what it is like to see an apple and the phenomenology of tasting or touching it. (See Macpherson 2011b, 129; 2012, 24-28; Siegel 2010a;2012, 203;Siegel and Silins ming, 3). Differently, early visual outputs (proto-objects) do not seem to have phenomenal character: since they are not always accessible to consciousness they do not have the same character-istics as perceptual experiences. The main difference is that proto-objects appear to be subpersonal while perceptual experiences are, by definition, personal mental states. For instance, Pylyshyn argues that the early visual output is almost never conscious:

Sensations are not the basis of perception but, rather, the result of it.

In the present view, perception begins with the output of transducers, not with sensations. Transducers provide the initial information, though this information is not necessarily available to consciousness (indeed, probably it almost never is).18 (Pylyshyn 1984, 174, my italics; see also Pylyshyn 1999b, 407.)

AlsoFodor(1983) seems to leave open the possibility that the output of early vision could be or not conscious:

Nobody doubts that the information that input systems provide must somehow be reconciled with the subject’s background knowledge. We sometimes know that the world can’t really be the way that it looks, and such cases may legitimately be described as the correction of input analyses by top-down information flow. (This, ultimately, is the reason for refusing to identify input analysis with perception.) (Fodor 1983, 73; my italics)

Similarly, Rensink (2000b, 1474-1475) claims that dynamic changes of the visual scene require attention to produce stable visual objects but he leaves open the possibility that static objects could be experienced without

18A transducer is a primitive system that takes organ stimulation as its input. A transduction system is a function that takes organ stimulation (e.g., the retina) and returns symbolic representations. SeeLyons(2001, 292) for this explanation and section 2.3.1for an brief analysis of transducers.

attention. In other words, it appears that visual experiences of dynamic objects rely on attention but static objects could be represented while unat-tended. Raftopoulos (2009a, 31) seems to follow this interpretation when he explains the role of attention in visual processing and visual representa-tion: “proto-object consists of tentatively but uniquely bound features that are segmented from visual information. This representation is a short-lived, vulnerable, andnot easily reportable form of visual experience” (my italics).

The reason why proto-objects are not always or almost never conscious for Raftopoulos is due to the theory of awareness adopted: Lamme’s theory.

Lamme(2003, 16) argues that there are two forms of awareness: phenomenal awareness which arises in between 100 to 150 ms, and access awareness which occurs at about 200 to 300 ms after stimulus onset. Phenomenal awareness results from the recurrent interaction within the visual system;

at this stage the information becomes available to other systems. Although it might trigger or modify behaviour, the process is unconscious for the subject (Lamme 2003, 16). Access awareness occurs when the interactions between visual brain areas spreads and includes mnemonic regions (frontal, prefrontal, temporal cortex), and “the visual information is put into the context of the systems’ current needs, goals and full history”; at this stage the input is consciously accessible (Lamme 2003, 16). In Block’s (2005, 46) terms phenomenal awareness is the minimal neural basis for perceptual content. Because Raftopoulos claims that early vision last for about 100-120 ms, the kind of awareness involved in proto-objects is phenomenal awareness (it arises right at the end of the early visual process). But this form of awareness is not sufficient to consciously perceive the output.

Raftopoulos’ interpretation is in agreement with Macpherson’s (2012), who writes:

Pylyshyn also explicitly denies that the output of the early visual system should be thought of as being or determining one’s visual experience or, put differently, that the content of the output of the early visual system should be thought of as the content of visual experience. [...] Pylyshyn’s thesis is, therefore, not about perceptual

experience but about the subpersonal representational outputs of brain processing mechanisms. (Macpherson 2012, 26-27)

Therefore early visual outputs are different from perceptual experiences.

Then it does not seem accurate to consider the outputs of early vision as being perceptual experiences.19

To summarize, early vision is a bottom-up process which encodes infor-mation about 2- and 3-dimensional object properties. The result of this pro-cessing are primitive visual outputs, or proto-objects (complex and volatile structures which are constantly regenerated and replaced by new stimuli).

Early vision does not (normally) categorize or recognize objects as cats or chairs even if it sometimes does (see section 2.3). On the contrary, it can-not identify a particular individual token as ‘my sister’ because this sort of categorization requires access to the subject’s cognitive background. In addition, primitive outputs are not necessarily conscious; the early visual system does not always (or almost never) generate perceptual experiences.

In order to become conscious, proto-objects necessitate further operations happening during the late visual processing. Let’s turn now to late vision.