• Aucun résultat trouvé

1. MENTAL REPRESENTATIONS AND BRAIN CIRCUITS UNDERLYING VISUAL RECOGNITION

1.2. L EARNING TO RECOGNIZE FACES

1.2.1. M ODELS OF FACE RECOGNITION

A double challenge for the visual system: selectivity and invariance

When a face is encountered for the first time, it is almost instantly encoded as a visual object belonging to the face category as well as to a given gender, age range, attractiveness level, etc. A known face can then be easily recognized, even in a crowd, and sometimes years after its first encounter. Thus the human face-recognition abilities exemplify the exceptional property of the visual system to effectively code for hundreds of individual visual items and allow fast and reliable performances at the subordinate-level 1(Tanaka, 2001).

Learning to recognize new faces is challenging for our visual system in many ways. First, since faces are perceptually very similar to each other (e.g., same constituting elements, fixed geometrical arrangement of these elements), exquisite discriminative abilities are required to distinguish between faces from different individuals (Gauthier et al., 1997). Second, known faces can be recognized across a variety of visual transformations (e.g., viewpoint, luminance, size but also facial expressions, age-related changes or hairstyle), which implies the extraction of invariant features within individual faces (e.g., Moses et al., 1996; Pourtois et al., 2005a).

Thus, the face-recognition system has to be highly selective but also largely tolerant to variations in visual input. Studying the human face-recognition system is therefore of particular relevance to understanding the processing of individual memory traces (i.e.,

1 Subordinate-level: recognition of visual objects has been classified within three levels of categorization, the basic-level (e.g., dolphin), the superordinatate one (e.g., animal) and the subordinate one (e.g., Flipper), with the basic-level being the most commonly used one, with the shortest reaction times (Rosch et al., 1976). Faces are usually recognized at the subordinate-level, and face recognition at the subordinate-level is as fast as at the

basic-subordinate-level processing; Biederman et al., 1999; Sigala et al., 2002) and recognition mechanisms used for other objects.

Features vs. configuration

All faces are composed of a common set of features (eyes, nose, and mouth), which are arranged similarly (the nose in the middle, the two eyes above and the mouth below). The features can vary slightly in shape, color and size and give discriminative cues, but individuation of faces relies mostly on the configural arrangement of features (e.g., the spacing between the eyes), as shown in the following experiments.

The well-known ‘inversion effect’, first shown by Yin (1969) who found that rotating a image upside-down affected face recognition much more than recognition of other categories of objects lead to the hypothesis that holistic/configural processing was critical for face recognition. Many other studies (for reviews, see Valentine, 1988 and Maurer et al., 2002) have provided evidence for the specific disruption of holistic/configural processing of inverted faces.

Two-other paradigms have been used to investigate configural processing: the ‘composite effect’ (Young et al., 1987) and the ‘part-whole effect’ (Tanaka & Farah, 1993). The

‘composite effect’ consists in the alignment of top and bottom halves of two different individuals that gives the illusion of seeing a totally new face. The alignment seems to interfere with the recognition of the parts, but only when the face is presented in the upright position (Figure 5). Non-aligned halves can be equally identified in upright and inverted positions (Young et al., 1987). This effect is a strong claim in favor of a configural processing of faces. The ‘part-whole effect’ illustrates the higher recognition performance for face parts (e.g., the nose) presented within the whole face rather than isolated. This is not the case for other types of stimuli such as scrambled faces, inverted faces or houses (Tanaka & Farah, 1993) and again represents an evidence of the special processing of faces (see also McKone, 2004).

Neuropsychological evidences also speak in favor of separate systems for object and face processing. Two complementary deficits, prosopagnosia (impaired recognition of faces) and visual object agnosia (impaired recognition of common objects) together, form a classic neuropsychological double dissociation (Farah, 1990; Moscovitch et al., 1997; Rivest et al., 2009). Farah (1990) argued that two independent recognition systems exist, one that is part-based and one that is holistic/configural, the former mostly implicated in object recognition and the latter in face recognition.

These results seem to reflect a special processing of faces, relying on configuration rather than features only, even if “the exact nature of this style of computation is not understood”

(McKone, 2009, p. 268). Other authors clearly claim that this special processing is related to our expertise with faces and that similar results could be obtained with highly homogeneous other objects of expertise (inversion effect with dog experts - Diamond & Carey, 1986;

Valentine, 1988). Moreover, increasing configural sensitivity has been shown in developmental research (for a review, see Friere & Lee, 2004). Based on their computational approach, Dailey and Cottrell (Dailey & Cottrell, 1999) proposed that face processing could rely on a domain-specific mechanism, arising naturally through infants’ development, without being innately present, only as a consequence of the ecological particular subordinate-level needs of faces (i.e., recognition at the individual level, Biederman et al., 1999). In a review on holistic processing of faces, Mc Kone et al. (2007) proposed a combined developmental and evolutionary explanation to this specificity.

Figure 5. Illustration of the “face-composite illusion.” Identical top halves of two face stimuli tend to be perceived as being different when they are aligned with different bottom halves. A. This visual illusion disappears when the two face halves are laterally offset, or B. when the faces are presented upside down (C). Source: Michel et al., 2007; Young et al., 1987.

Prototype face representations

Our remarkable ability to discriminate and recognize hundreds of faces has been attributed to the encoding of face exemplars relative to a face prototype (Valentine, 1991; Leopold et al., 2001). Faces differ from each other in multiple dimensions (e.g., nose size, distance between the eyes). These dimensions are coded as vectors of a multidimensional space and individual faces are represented as single points within this space. The average face is the

‘superimposition’ of all faces encountered in one’s life. Faces are discriminated by their distance from the average and/or by distance from each other. Thus faces that are close to each other in this space are quite similar. The closest the face is from the average, the most

‘typical’ it is. This has been used to explain why very distinct faces are easier to recognize (Valentine, 1991), and also as a basis for the other-race effect2 (e.g., Byatt & Rhodes, 2004).

Lofler et al. (Loffler et al., 2005) provides evidence for neural populations tuning3 to faces along ‘identity’ axes within this space. It has been suggested that perceptual learning leads to an increase of the distance across faces allowing a faster and more accurate identification (Hussain et al., 2009).

If the average face is the mean of all our faces’ experiences, each face is also the result of a

‘superimposition’. Burton and colleagues proposed that the visual system is able to create a prototype-face from all the exemplars we encounter for one individual face and to implement it in the face space (Burton et al., 2005).

.

Unfamiliar vs. familiar face representations

The computational proposition of a prototype-face for each individual person we know suggests that this average face would be created after many encounters with one face and underlies its recognition across different visual conditions (e.g. viewpoint, contrast).

However, the recognition of a face identity which has been encountered only once is also possible (for a review, see Hancock et al., 2000). Are there different representations of unfamiliar faces different from representations of familiar ones?

Some studies found that repetition priming (perceptual facilitation illustrated by decrease reaction time or brain activity) is enhanced by the familiarity of the face-stimuli (Bruce &

Valentine, 1985; R. Henson et al., 2000; Boehm et al., 2006). Bruce (1982) investigated the

2 Other–race effect: bias in recognition leading to better performance for own-race faces compared with other-race faces (for a review, see Meissner & Brigham, 2001 )

3 Tuning: selectively responding

effect of modifying viewpoint or expression on the recognition of familiar and unfamiliar faces, and found that unfamiliar faces were more affected by these changes than familiar ones.

Other studies showed that familiar faces can tolerate transformations (e.g., viewing angle, Goshen-Gottstein & Ganel, 2000; lighting, Hill & Bruce, 1996). Thus it seems that representations of unfamiliar faces are primarily picture-based and then rely on a higher-level of processing (semantic) as faces become familiar (Burton et al., 1999; Pourtois et al., 2005b).

It has also been shown that there might be a shift from external (hairstyle, face shape) towards internal features in the processing of unfamiliar/familiar faces (Ellis et al., 1979; Young et al., 1985; Goshen-Gottstein & Ganel, 2000). O’Donnell and Bruce (2001) suggested that the eyes rather than internal features in general become more important as a face becomes familiar.

Thus, it appears that representations for familiar faces depend less on ‘pictorial’ and also unstable cues than unfamiliar faces. They are more abstract and this allows generalization in new visual context. However, it still remains unclear how this abstract representation is created.

Cognitive models of face recognition

Some influential models of face recognition (e.g., Bruce & Young, 1986; Burton et al., 1990;

Ellis & Lewis, 2001; Breen, 2000) tried to explain how this abstract representation is built and proposed that information about a face is processed across hierarchically-organized units including (1) face recognition units (FRUs) underlying the recognition of a familiar face based on the structural encoding of the face, (2) multimodal person identity nodes (PINs) linking information from different domains about the same person, (3) semantic information units (SIUs) where attributes about the person are stored, and (4) an affective module mediating affective responses to familiar stimuli independently of overt identity recognition (Figure 6).

Strengthening of the link between an individual FRU and its corresponding PIN after repeated exposure to a given face would lead to more a efficient processing of the face (Burton et al., 1990). This hypothesis accounts for both increased explicit recognition as well as implicit effects, such as faster responses (Ellis et al., 1990; Paller et al., 1992) and increased positive affective response for repeated stimuli (mere-exposure effect; Zajonc, 1968; Butler & Berry, 2004). The latter effect might also rely on a strengthening of the link between the FRU and the affective response module. A new FRU may be created for each newly encountered face (Burton, 1994).

Structural encoding

Figure 6: A model for face processing showing two independent routes from the Face Recognition Units (FRU): one leading to the Person Identity Nodes (PIN) and then to name retrieval processes; and a second route leading to the Affective Response to Familiar Stimuli (modulating the skin conductance response). The PINs and the affective response module project to an integrative device which compares the expected affective response with the actual affective response to allow attribution processes to take place. Source: Adapted from Ellis & Lewis, 2001 and Breen, 2000.

Recent models, built on neuropsychological findings, propose that these components of the face recognition system might be under the control of an executive module, preventing from doing false recognitions (Rapcsak, 2003, Bartlett et al., 2009). Further details on these models and on how they could be implemented in the brain will be provided in the following section, which presents the neural correlates of face recognition and learning.

1.2.2. Neural correlates of face recognition and learning