• Aucun résultat trouvé

The role of experience and sleep in visual learning : behavioral and brain imaging investigation

N/A
N/A
Protected

Academic year: 2022

Partager "The role of experience and sleep in visual learning : behavioral and brain imaging investigation"

Copied!
181
0
0

Texte intégral

(1)

Thesis

Reference

The role of experience and sleep in visual learning : behavioral and brain imaging investigation

DUHOUX MATHIEU, Stéphanie

Abstract

Identifying and recognizing visual objects are crucial skills that enable to interact with the environment adequately. Most of the time, these processes are rapid and effortless. However, recognition is the final outcome of complex processes, including the creation of representations of visual objects and their consolidation into memory, which mechanisms and neural correlates remain largely unknown. The present work reports four studies, aiming at better understanding visual learning, and testing more specifically the role of experience and sleep on visual learning. In particular, this works highlights a crucial role for the hippocampus in visual representation formation, for both 3D-objects and faces. This work also shows evidence that sleep may play a critical role in the consolidation of memory traces for faces, making human's brain more resistant to misleading information. By combining behavioral and neuroimaging approaches, this thesis provides new insights into the cognitive and brain mechanisms underlying visual learning.

DUHOUX MATHIEU, Stéphanie. The role of experience and sleep in visual learning : behavioral and brain imaging investigation. Thèse de doctorat : Univ. Genève et Lausanne, 2009, no. Neur. 51

URN : urn:nbn:ch:unige-168349

DOI : 10.13097/archive-ouverte/unige:16834

Available at:

http://archive-ouverte.unige.ch/unige:16834

Disclaimer: layout of this document may differ from the published version.

(2)

FACULTE DES SCIENCES

DOCTORAT EN NEUROSCIENCES des Universités de Genève

et de Lausanne

UNIVERSITE DE GENEVE FACULTE DES SCIENCES

Docteure Sophie Schwartz, directrice de thèse Professeur Patrik Vuilleumier, co-directeur de thèse

TITRE DE LA THÈSE

THE ROLE OF EXPERIENCE AND SLEEP IN VISUAL LEARNING:

BEHAVIORAL AND BRAIN IMAGING INVESTIGATIONS

THESE Présentée à la Faculté des Sciences

de l’Université de Genève

pour obtenir le grade de Docteure en Neurosciences

par

Stéphanie DUHOUX MATHIEU d’Echallens (VD)

Thèse N° 51

Genève

2009

(3)
(4)

Remerciements

L’heure est enfin arrivée d’écrire les remerciements de ma thèse, ce qui signifie que je suis, presque, arrivée à la fin de ce long parcours initiatique. Le chemin n’a pas été tout droit et j’ai pu découvrir mes forces et mes faiblesses. De nombreuses personnes ont contribué de loin ou de près à la réalisation de ce travail et je tiens à les remercier de tout coeur.

Tout d’abord, j’aimerais remercier Sophie Schwartz de m’avoir acceptée comme première

« sonic », de m’avoir fait confiance durant ce projet, de m’avoir initiée aux joies de l’IRM et de Matlab, et de m’avoir guidée vers le but ultime : rendre ma thèse

Je remercie ensuite Patrik Vuilleumier de m’avoir accueillie dans son laboratoire et d’avoir accepté d’être mon co-directeur de thèse. Je le remercie également pour son écoute attentive, ses remarques critiques aux moments clés de l’élaboration des designs et de l’interprétation des données.

Je voudrais ensuite remercier les professeurs Catherine Brandner et Martial Van der Linden d’avoir accepté de faire partie de mon jury d’examen et de soutenance de thèse. Je les remercie pour le temps qu’ils m’ont consacré et pour toutes leurs remarques et suggestions précieuses qui ont grandement contribué à l’amélioration de la qualité de ce manuscrit de thèse.

Je remercie aussi le professeur Dominique Muller d’avoir accepté de présider ma soutenance de thèse mais aussi de m’avoir permis de commencer ma thèse en m’assurant une partie de mon financement durant les 3 premières années en l’échange de la gestion du secrétariat de l’école doctorale. Merci pour sa bonne humeur constante, et son soutien tout au long de la thèse.

Je remercie le Fonds National Suisse de m’avoir permis de commencer cette thèse, et les Fondations Schmidheiny et Boninchi, de m’avoir permis de la terminer. Je remercie également la Fondation Labor & Pax qui m’a permis de communiquer mes résultats lors de plusieurs congrès et l’Association Genevoise de Femmes Diplômées des Universités pour le prix d’encouragement qu’elle ma décerné au début de la thèse.

Je remercie la professeure Martine Collart, pour son rôle de mentor, son écoute et conseils bienveillants dans les bons et les moins bons moments.

Je remercie Dr Pierre Maquet et Dr Philippe Peigneux pour les conseils qu’ils m’ont

(5)

Je remercie également le professeur André Langaney, Dr Ute Leonards et Dr François Vital- Durand qui m’ont transmis lors de mon diplôme et de mon DEA, un peu de leur passion et mes premiers outils pour devenir chercheur. Je remercie aussi le Commandant Cousteau et Albert Falco. Ma famille saura pourquoi !

Je remercie toutes les personnes qui ont participé à ces expériences, amis, collègues, étudiants, souvent tard le soir, ou tôt le matin et ceci, souvent pendant le week-end. Sans eux rien n’eut été possible. Je remercie aussi mes collègues qui ont pendant ce temps été à mes côtés pour piloter le scanner : Mohammed Seghier, François Lazeyras, Frédéric Andersson, Karim Ndiaye, Amal Achaibou.

Je remercie tous les autres membres du labnic, qui par leur bonne humeur, leurs conseils et leurs savoirs variés, ont fait de cette thèse une très belle expérience humaine.

Je remercie Aurélie, mon amie et collègue, mon véritable coach de ces 6 derniers mois où j’étais à l’heure de San Francisco, et qui est en ce moment encore en train de chercher des coquilles dans mon texte. Tant de choses partagées, tant de souvenirs en commun, ce n’est que le début !

Je remercie Markus, pour son amitié et sa collaboration dans l’une de ces études, et pour son aide à chaque fois que mes ordis m’ont lâchée !

Je remercie Irina, pour sa gentillesse, sa douceur et son amitié, et pour les bons moments passés en congrès. Merci aussi 1000 fois pour ton aide sur cette dernière ligne droite !

Je remercie Ulli, pour son aide précieuse pour mes études sommeil et visages et ses commentaires toujours critiques et constructifs.

Je remercie aussi les anciens du labnic, Nadia, Gilles ; mes collègues du French office, Frédéric, Karim, Yann ; les moins anciens, et les nouveaux. Je risque d’oublier quelqu’un alors merci à tous !

Je remercie les secrétaires, Raquel et Antonia, pour les petites pauses thé, les coups de main administratifs et leur amitié.

Je remercie mes amies de l’AGFDU, qui m’ont soutenue du début à la fin de ma thèse, et en particulier Arielle et Nadège pour leurs conseils et leurs encouragements.

(6)

Je remercie le Dr Ansermet, pour sa contribution non négligeable à la réalisation de la fin de cette thèse. J’étais à plat, vous m’avez remise d’aplomb. Merci !

Je remercie mes amis de biologie, en particulier Maha, Ewa, Juliette, Estelle et Fred. Toujours là pour prendre des nouvelles et encourager. Vous êtes supers !

Je remercie mon amie de toujours, Véro, pour tous les bons moments partagés depuis tant d’années.

Je remercie les éducatrices de la Petite Maisonnée, en particulier Alice, pour avoir pris soin d’Héloïse et fait d’elle une petite fille épanouie malgré une maman un peu stressée !

Je remercie mes voisines, Marimar, Tamara et Alba, pour tous les petits moments partagés et leur aide durant ces derniers mois.

Je remercie ma famille, mes parents, mon frère et ma sœur, pour avoir toujours crû en moi, pour avoir tout fait pour me faciliter la vie et pour m’avoir entourer de leur amour bienveillant. Je vous aime ! Je remercie ma grand-mère, qui a un jour quitté l’Espagne de Franco pour un avenir meilleur … Merci de m’avoir enregistré tant de reportages scientifiques ! C’est aussi chez elle que j’ai trouvé maintes fois refuge pour préparer les examens de ces longues années d’études et écrire quelques passages de ma thèse. Un abrazo muy muy fuerte a mi abuelita querida !

Je remercie aussi ma belle-famille pour leurs encouragements et leurs multiples attentions tout au long de ce parcours laborieux.

Enfin, je remercie mes deux amours, Gabriel et Héloïse, à qui j’ai mené la vie dure ces derniers temps et qui ont fait que j’ai tenu. Gabriel m’a supporté dans tous les sens du terme ; pour cela et pour son amour depuis 12 ans, je lui dis merci du fond de mon coeur.

Héloïse, mon petit cœur, arc-en-ciel de bonheur, qui illumine chaque jour de ma vie depuis un peu plus de 2 ans, à toi je dédis cette thèse.

(7)

Summary

Identifying and recognizing visual objects are crucial skills that enable to interact with the environment adequately. Most of the time, these processes are rapid and effortless. However, recognition is the final outcome of a complex three-step process, which mechanisms and neural correlates remain largely unclear. Encoding is the first step towards visual recognition.

It leads to the construction of a new mental representation, indispensable to any future recognition. It is still unclear which information is extracted and how the new representations are built. The second step is Consolidation. Many studies have been conducted on Encoding but fewer on Consolidation, and it is not well understood yet how new representations are stored, maintained and stabilized in a network of pre-existing representations. Moreover, several variables might have an effect on memory consolidation, including attention and sleep. Finally, the third step is Recognition itself, which involves the matching of the visual input to information in memory or mental representation. Experimental data support different theories of recognition processes. The present work reports four studies, aiming at better understanding these different steps underlying visual learning, and testing more specifically the role of experience and sleep on visual learning.

The first study was designed to investigate the formation of visual object representations in the human brain, and more specifically to identify the neural bases of 3D-object learning. The major finding in this study is that the hippocampus contributes to the learning and to the recognition of 3D-objects seen from novel perspectives, by processing spatial information of the object’s structure and by linking memories for 2D-views to 3D-object knowledge.

The three other studies presented in this work focused on memory consolidation. One study tested behaviorally the effect of attention at encoding on subsequent memory. It showed that explicit recognition requires attention at encoding whereas implicit memory could occur for previously ignored stimuli. This study also showed that attention at encoding may create multiple representations (i.e., some view-specific, some generalizing across mirror reflections), which are differentially activated in implicit and explicit memory tests. The two other studies investigated the effect of sleep after learning pictures of faces, either behaviorally or using fMRI. We found that sleep may favor the consolidation of face information supporting explicit memory, independent of retinotopic-specific conditions. We also found that sleep fosters the consolidation of configural memory and thus protects against false recognition of faces with featural resemblance, by increasing activity in the hippocampus and in face-specific fusiform regions. These findings demonstrate that sleep make one’s brain more resistant to misleading information.

Overall, by combining behavioral and neuroimaging approaches, this thesis provides important insights into the cognitive and brain mechanisms that underlie visual learning.

(8)

List of abbreviations

3-D Three-dimensional ACC Anterior Cingulate Cortex ANOVA Analyse de la Variance

BOLD Blood-Oxygen Level-dependent CRT Cathode Ray Tube

EEG Electroencephalogram EPI Echo Planar Imaging FFA Fusiform Face Area FG Fusiform Gyrus FOV Field of View

FMRI Functional Magnetic Resonance Imaging FRU Face Recognition Unit

FWHM Full Width at Half Maximum GLM General Linear Model

HRF Hemodynamic Response Function IPS Intraparietal Sulcus

LCD Liquid Crystal Display LOC Lateral Occipital Cortex LTM Long Term Memory M Mean

MEE Mere-exposure effect

MNI Montreal Neuroscience Institute MTL Medial Temporal Lobe

Non-REM Non-Rapid Eye Movement OFA Occipital Face Area

PET Positron Emission Tomography PFC Prefrontal Cortex

PIN Person Identification Node REM Rapid Eye Movement

RSVP Rapid Serial Visual Presentation

RT Reaction Time

SBSOD Santa Barbara Sense of Direction scale SD Standard Deviation

SE Standard Error

SMG Supramarginal Gyrus

SPM Statistical Parametric Mapping STG Superior Temporal Gyrus STM Short Term Memory STS Superior Temporal Sulcus SWS Slow Wave Sleep

TE Echo Time

TR Repetition Time

WM Working memory

(9)

INTRODUCTION... 4

1. MENTAL REPRESENTATIONS AND BRAIN CIRCUITS UNDERLYING VISUAL RECOGNITION ... 6

1.1.LEARNING TO RECOGNIZE OBJECTS... 6

1.1.1.MODELS OF OBJECT RECOGNITION... 6

Structural description models ... 7

View-based models ... 8

Multiple representational systems ... 11

1.1.2NEURAL CORRELATES OF VISUAL RECOGNITION AND LEARNING... 11

Two visual pathways ... 11

Functional organization of the ventral stream ... 12

The role of the inferior temporal lobe in object recognition ... 13

Experience-dependent changes in brain responses underlying visual learning... 15

1.2.LEARNING TO RECOGNIZE FACES... 16

1.2.1.MODELS OF FACE RECOGNITION... 16

A double challenge for the visual system: selectivity and invariance... 16

Features vs. configuration ... 17

Prototype face representations... 19

Unfamiliar vs. familiar face representations... 19

Cognitive models of face recognition... 20

1.2.2.NEURAL CORRELATES OF FACE RECOGNITION AND LEARNING... 21

Distributed neural network recruited in face processing... 21

Nature of the representations in the Fusiform Gyrus ... 24

Experience-dependent neural changes underlying face learning ... 26

2. MEMORY CONSOLIDATION ... 28

2.1.DISTINCT MEMORY SYSTEMS... 28

2.1.1.SHORT-TERM VS. LONG TERM MEMORY... 28

2.1.2.DECLARATIVE (EXPLICIT) VS. NON-DECLARATIVE (IMPLICIT) MEMORY... 29

2.2.THE MEDIAL TEMPORAL LOBE... 30

2.2.1.ANATOMY OF THE MEDIAL TEMPORAL LOBE... 30

2.2.2.ROLE OF THE HIPPOCAMPUS IN VISUAL RECOGNITION... 31

2.3.MODELS OF MEMORY CONSOLIDATION... 32

2.3.1.STANDARD CONSOLIDATION THEORY... 33

2.3.2.MULTIPLE TRACE THEORY... 34

2.4.ROLE OF ATTENTION... 35

2.4.1.DEFINITION... 35

2.4.2.EFFECT OF ATTENTION ON ENCODING... 35

2.4.3.MECHANISMS... 37

2.4.4.EFFECT OF ATTENTION ON IMPLICIT AND EXPLICIT MEMORY... 37

2.5.ROLE OF SLEEP... 39

2.5.1.WHAT IS SLEEP? ... 39

2.5.2.INFLUENCE OF SLEEP ON MEMORY PROCESSES... 39

2.5.3.SLEEP AND FACE LEARNING... 41

3. AIMS AND INVESTIGATIONS CONDUCTED IN THIS WORK ... 43

3.1.FORMATION AND GENERALIZATION OF VISUAL REPRESENTATIONS OF NEW 3-D OBJECTS... 43

3.2.ROLE OF ATTENTION AT ENCODING ON SUBSEQUENT EXPLICIT AND IMPLICIT MEMORY... 44

3.3.ROLE OF SLEEP IN MEMORY FOR FACE: BEHAVIORAL INVESTIGATIONS... 45

3.4.ROLE OF SLEEP IN MEMORY FOR FACES: AN FMRI INVESTIGATION... 46

EXPERIMENTAL PART ... 47

1. GENERAL METHODS ... 48

1.1.BEHAVIORAL MEASURES OF IMPLICIT AND EXPLICIT MEMORY... 48

1.1.1.PRIMING... 48

1.1.2.MERE-EXPOSURE EFFECT... 48

1.1.3.OLD-NEW RECOGNITION TEST... 49

1.2.FUNCTIONAL MRI... 50

1.2.1.WHAT IS FMRI... 50

1.2.2.EXPERIMENTAL PROTOCOLS CONSIDERATIONS... 50

(10)

1.2.3.STATISTICAL ANALYSES... 51

1.2.4.LIMITATIONS... 52

1.3.ASSESSING THE ROLE OF SLEEP IN MEMORY CONSOLIDATION: SLEEP-DEPRIVATION VS NATURAL SLEEP-WAKE CYCLES... 52

2. THE ROLE OF THE HUMAN HIPPOCAMPUS IN THREE-DIMENSIONAL OBJECTS LEARNING ... 54

ABSTRACT... 54

2.1.INTRODUCTION... 55

2.2.MATERIALS AND METHODS... 57

2.2.1.SUBJECTS... 57

2.2.2.STIMULI... 57

2.2.3.DESIGN AND PROCEDURE... 59

2.2.4.MRI SCANNING... 60

2.2.5.FMRI DATA ANALYSIS... 60

2.3.RESULTS... 61

2.3.1.LEARNING PART... 61

Behavioral Results... 61

FMRI Results ... 62

2.3.2.GENERALIZATION PART... 64

Behavioral Results... 64

FMRI Results ... 64

2.3.3.CORRELATIONS WITH INDIVIDUAL PERFORMANCE... 69

2.4.DISCUSSION... 71

2.4.1.REGIONAL CHANGES IN BRAIN ACTIVITY DURING 3D-OBJECT LEARNING... 71

2.4.2.BRAIN REGIONS SHOWING VIEW-SPECIFIC EFFECTS... 72

2.4.3.NEURAL CODING OF 3D-OBJECT REPRESENTATION... 73

2.4.4.ROLE OF THE SPATIAL SKILLS IN 3D OBJECT RECOGNITION... 74

3. SELECTIVE ATTENTION AT ENCODING MODULATES SUBSEQUENT EXPLICIT AND IMPLICIT VISUAL MEMORY ... 75

ABSTRACT... 75

3.1.INTRODUCTION... 76

3.2.METHODS... 77

3.2.1.PARTICIPANTS... 77

3.2.3.PROCEDURE... 77

3.3.RESULTS... 79

3.3.1.BEHAVIORAL PERFORMANCE IN THE STUDY PHASE OF EXPOSURE TO SUPERIMPOSED OBJECTS... 79

3.3.2.EXPERIMENT 1:DIRECT BEHAVIORAL TEST OF EXPLICIT MEMORY... 79

3.3.3.EXPERIMENT 2:INDIRECT BEHAVIORAL TEST SHOWING IMPLICIT MEMORY FOR UNATTENDED OBJECTS ... 80

3.4.DISCUSSION... 81

3.5.COMPLEMENTARY FMRI EXPERIMENT... 82

4. BEHAVIORAL EVIDENCE OF SLEEP-RELATED EFFECTS ON EXPLICIT MEMORY AND THE LEVEL OF REPRESENTATION FOR FACES ... 85

ABSTRACT... 85

4.1.INTRODUCTION... 85

4.2.EXPERIMENT1:EFFECTS OF STIMULUS ENCODING ON SLEEP-RELATED EXPLICIT MEMORY CONSOLIDATION... 87

4.2.1.METHODS... 87

Participants ... 87

Stimuli ... 88

Experimental protocol ... 89

4.2.2.RESULTS... 90

Reaction times ... 90

Explicit recognition ... 91

4.2.3.DISCUSSION... 92

4.3.EXPERIMENT2:SLEEP EFFECTS ON MEMORY REPRESENTATION... 93

4.3.1.METHODS... 94

Participants and stimuli ... 94

(11)

Experimental protocol ... 94

4.3.2.RESULTS... 95

Analysis of the number of pleasant faces ... 96

Analysis of the number of recognition hits... 97

4.3.3.DISCUSSION... 98

4.4.GENERAL DISCUSSION... 99

5. INFLUENCE OF SLEEP ON FACE RECOGNITION AND ASSOCIATED ACTIVITY IN RIGHT HIPPOCAMPUS AND FUSIFORM CORTEX ... 103

ABSTRACT... 103

5.1.INTRODUCTION... 103

5.2.METHODS... 104

5.2.1.PARTICIPANTS... 104

5.2.2.STIMULI... 105

5.2.3.EXPERIMENTAL PROTOCOL... 107

5.2.4.SLEEP QUALITY ASSESSMENT... 108

5.2.5.VIGILANCE TASK... 109

5.2.6.FACE-SELECTIVE REGIONS LOCALIZER... 109

5.2.7.MRI PARAMETERS... 110

5.2.8.FMRI ANALYSIS... 110

5.3.RESULTS... 111

5.3.1.BEHAVIORAL RESULTS... 111

Reaction times ... 111

Explicit recognition ... 111

5.3.2.FMRI RESULTS... 114

Memory for identity ... 114

Memory for configuration ... 117

Correlation between hippocampal and fusiform activity ... 121

5.4.DISCUSSION... 122

GENERAL DISCUSSION... 124

1.BRIEF SUMMARY OF THE MAIN RESULTS... 125

2.VIEW-DEPENDENT VS. VIEW-INDEPENDENT REPRESENTATION... 126

3.FUNCTIONAL DISSOCIATION OF IMPLICIT AND EXPLICIT MEMORY... 126

4.CENTRAL ROLE OF THE HIPPOCAMPUS IN THE FORMATION OF STRUCTURAL/CONFIGURAL REPRESENTATIONS AND THEIR CONSOLIDATION... 127

5.MEMORY CONSOLIDATION THEORY... 128

6.IMPLICATION OF OUR FINDINGS FOR MODELS OF FACE RECOGNITION... 128

7.CONCLUSION... 130

REFERENCES... 131

ANNEXES ... 153

(12)

INTRODUCTION

(13)

Identifying and recognizing visual objects is a crucial skill that enables us to interact with our environment adequately. Identifying whether the insect flying around your head is a butterfly or a wasp will make you adapt your behavior and decrease your gesture intensity if it is a wasp. Recognizing one’s child among a group of kids going out of the school is also essential.

Most of the time, these processes are rapid and effortless. However, recognition is the final outcome of a complex three-step process, which mechanisms and neural correlates remain largely unclear.

Encoding is the first step towards visual recognition. It leads to the construction of a new mental representation, which links perception to mental images, indispensable to any future recognition. It is still unclear which information is extracted and how the new representations are built. The second step is Consolidation, the transformation of a labile memory trace into a stable long-term one. Many studies have been conducted on Encoding but much fewer on Consolidation. Thus, it is still unclear how new representations are stored, maintained and stabilized in a network of pre-existing representations. Finally, the third step is Recognition itself, corresponding to the matching of the visual input to the mental representation. Here also, the mechanisms allowing this matching are not clearly understood.

The first critical questions addressed in this work are how experience shapes object representations and what processes underlie recognition. The second set of questions relates to assessing whether some cognitive and physiological variables, such as attention at encoding or sleep after encoding, might modulate memory consolidation.

The introduction of this thesis is organized as follows. First, the literature on object and face recognition will be reviewed. Then, the question of memory systems and the crucial role of the hippocampal structure will be addressed, before going through the effects of attention and sleep on memory consolidation. With these considerations in mind, I will finally draw the main goals of the experimental studies conducted in this thesis.

(14)

1. Mental representations and brain circuits underlying visual recognition

The creation of a new mental representation faces a double challenge: it has to be invariant under input image transformation, allowing ‘object constancy’, i.e. recognition despite substantial changes in the retinal entry (size, lightening or viewpoints changes – see Figure 1) but also very detailed, in order to allow discrimination between highly similar objects (e.g.

faces).

Figure 1. Complex scene showing our ability to recognize a chair despite size, orientation, viewpoint or color changes. Source: I. Bülthoff. http://www.kyb.mpg.de/bu/projects.html.

But which representations will satisfy these requirements? What kind of information is extracted from the image? And how are these representations implemented in the brain?

Despite an abundant literature on the domain, including all existing methodological tools (psychophysics, computational theory, neurophysiology, neuropsychology and neuroimaging), most of these questions remain open.

This chapter will focus on the literature existing on object and face recognition. I will also review the debates that still exist in these areas of research, as well as the models that have been proposed to explain these processes and their neural correlates.

1.1. Learning to recognize objects 1.1.1. Models of object recognition

Many theories of object recognition have been proposed to account for object constancy, and in particular for three-dimensional (3-D) object recognition from different viewpoints (for

(15)

reviews, see Wallis & Bülthoff, 2002; Peissig & Tarr, 2007). There are two main theories: the first one favored object-centered representations, proposing that object recognition is viewpoint-invariant; the second one argued for viewer-centered representations, positing that recognition is viewpoint-dependent. These models tend to be reconciled today. Here is a review of these different theories and the underlying experimental evidences.

Structural description models

The most influential structural description theory is that of David Marr published in Marr &

Nishihara’s (Marr & Nishihara, 1978) seminal paper and in Marr’s (Marr, 1982) book. Marr

& Nishihara (1978) proposed that objects are represented in a view-invariant manner, with part-based structural representations built on elemental 3-D volumes and their spatial relationship. According to this theory, there are three stages towards object recognition. The first step creates a 2-D representation, containing information about the lines and edges of the object. The second step leads to a 2 ½-D representation including depth information, thus describing edges and surfaces from the viewer’s perspective. Finally, a 3-D object representation is established, in which object parts are encoded with respect to an object- centered reference frame rather than a viewer-centered one. This representation allows view- invariant recognition, with a matching of the input 2-D shape, whatever its orientation, to the stored 3-D model. Marr & Nishihara (1978) proposed that objects are represented as hierarchically organized structural models of cylinders, allowing discrimination and categorization judgments at different details levels. For example, a standing man can be subdivided in six major cylinders representing the head, trunk, and four limbs. The arm can then be subdivided in 2 main cylinders, and so on till the 5 finger cylinders (Figure 2A). Marr and Nishihara’s theory was purely computational, and offered no empirical support for this model.

A well-known development of Marr’s model is the ‘Recognition By Components’ (RBC) theory due to Biederman, who suggested that any object can be represented by a small set of simple geometric component, called ‘geons’ (geometric ions) and their explicit relationships (Biederman, 1987). The set of geons is composed of three-dimensional volumes such as bricks, cylinders, wedges, cones, defined by properties, including whether the edge is straight or curved, whether the object is symmetrical or not, if the cross-section is of constant size or expands or contracts, and whether the axis is straight or curved. Objects are modeled as an arrangement of geons, with the relations (such as “top-of”, “side-connected”) specified as part of the object model (Biederman, 1990; Hummel & Biederman, 1992) (Figure 2B).

(16)

Figure 2. A. Example of a cylinder organization underlying discrimination at different details levels.

Source: D. Marr, “Vision” (1982). B. Twelve geons and some examples objects’ geon-decomposition.

Only two or three geons are required to uniquely specify an object. The relationships among the geons matter, as illustrated by the pail and the cup. Source: Biederman, 1990.

Geons can be identified even when portions of their edges and vertices are occluded or deleted. Moreover, they are invariant under rotation in depth. Geons consequently allow robust object perception when the image is projected from a novel viewpoint or is degraded (Biederman, 1987). While Marr’s theory was only computational, Biederman’s one receives psychophysical support. Priming (perceptual facilitation illustrated by decreased reaction times) was found, independently of object changes in position (Biederman & Cooper, 1991), size (Cooper et al., 1992) or orientation in depth (Biederman & Gerhardstein, 1993).

In sum, these studies argued that the recognition of an object is based on a single representation, formed of 3-D parts and relations among them (object-based representation), which can be similarly activated by varying visual inputs (e.g., different viewpoint conditions). However, it is not clear how this approach relying on the recovery of 3-D volumes is robust enough to subserve general object recognition (Tarr & Bulthoff, 1998).

Indeed, stimuli must be decomposable in geons, and have the same visible parts over viewpoints changes (Biederman, 1987), which is not the case for all objects (e.g. a shoe).

View-based models

Structural-description models predict that a novel view of a familiar object is recognized as accurately and rapidly as a known view. However, another set of psychophysical evidence suggests otherwise.

Palmer et al. (1981) asked subjects to rate the typicality of object views, from 1 (very typical) to 7 (very untypical). Other individuals were asked to name objects presented across five perspectives chosen from the rating and measured the latency of response. Pictures rated as

(17)

the most typical (‘canonical’) were named faster and the latency was increased while the

‘canonicalness’ decreased. Jolicoeur (Jolicoeur, 1985) also observed increased reaction times (RTs) in naming line drawings of common objects when they were rotated from the usual upright orientation. These results suggest that objects are recognized by transforming, more precisely aligning, the input image to the stored canonical representation (Ullman, 1989).

However, RT differences found by Jolicoeur (1985) were relatively small and diminished with repeated presentations of the objects, suggesting changes in the representation across learning.

Tarr and Pinker (1989) argued that Biederman and colleagues observed view-invariance with familiar objects, and later with geons-decomposable objects, only because these objects had probably been already encountered in most viewpoints. Thus they tested recognition for completely novel 2-D and 3-D objects (Tarr & Pinker, 1989; Tarr, 1995 – see figure 3A and 3B). First, they made subjects learn each novel object at a single orientation and then tested them with the same novel objects presented with several orientations, in a normal or mirror- reversed way. They observed increased RT, in relation to the distance from the trained view, similarly to the pattern found in mental rotation tasks (Shepard & Metzler, 1971). Following practice, as in Jolicoeur’s study (Jolicoeur, 1985), the trained viewpoints were recognized as quickly as the first trained view. Finally, subjects were presented with novel orientations and showed increased RT, in relation with the nearest previously learned views. Tarr and Pinker (1989) suggested that multiple views constituted object mental representations and that object recognition occurred through the use of mental rotation (Shepard & Metzler, 1971) or a continuous alignment process (Ullman, 1989). Moreover they argued that experience shaped object representations and lead to invariance through this collection of views.

Tarr and Pinker (1989) used standard and mirror-reversed images and found similar RT across all orientations of the mirror-reversed images. They inferred that an in-depth rotation was used to match the reversed image to the memorized counterpart and argued again that this was a strong evidence of the multiple-views hypothesis. Further studies led to contradictory results: in some cases, performances were sensitive to mirror-image reflection (Lawson &

Humphreys, 1996), while in others, recognition across mirror-image transformation could be achieved with no cost (e.g., Biederman & Cooper, 1991; L. A. Cooper et al., 1992).

Following Tarr and Pinker’s (1989) argumentation towards multiple-view representation, new studies were conducted using a wide range of stimuli (e.g. “amoeboid” and “paper-clip”

objects, (Bulthoff & Edelman, 1992) – see Figure 3C and 3D; other novel 3-D objects (Hayward & Tarr, 1997), and 2-D line drawings of everyday life objects (Lawson &

Humphreys, 1996)) and brought support for this hypothesis. Bülthoff and Edelman (1992) argued in favor of multiple-views hypothesis but proposed an alternative to the alignment

(18)

process leading to recognition. They found evidence for an interpolation/extrapolation process (see also Bulthoff et al., 1995). Bülthoff and Edelman (1992) based their assumption on the following results: they found a lower recognition error rate for the so-called INTRA views (unfamiliar viewpoints between learned views - interpolation) than for the EXTRA views (viewpoints outside the trained views, but in the same horizontal plane - extrapolation).

Finally, they showed that after an extensive training from many viewpoints, view-specific effects disappeared.

All these studies were taken as powerful evidence for viewpoint-dependent object representations. However, Biederman and colleagues pointed out that Bülthoff and Edelman's amoeboid stimuli did not have structures that could be readily decomposed into parts (Biederman & Gerhardstein, 1993). As view-based representations do not take structure into account, they may not be able to explain recognition between very similar exemplars, only differentiated by geometric variations (Hummel, 2000). More generally, Hummel (2000) argued that view-based theories could explain how viewed images are matched to the stored views but not how those views are represented. Thus it is not clear how view-based theories can accommodate to a large set of natural and/or more structured stimuli.

Figure 3. Objects sets used by (A) Tarr and Pinker (Tarr & Pinker, 1989); (B) Tarr, 1995; (C) and (D) Bülthoff and Edelman (Bulthoff & Edelman, 1992). Source: Adapted from the three above-mentioned articles.

(19)

Multiple representational systems

Today, these different theories tend to be reconciled. Indeed, both types of representations could mediate object recognition under different contexts, tasks or classes of objects. Tarr and Bülthoff (1998) proposed that “image-based and structural information may be both functionally and computationally complementary”. Biederman and colleagues also suggested that under difficult viewing or judgment conditions, and especially within an object class, subjects could base their recognition on episodic information and view-based representations to make their response (Biederman & Cooper, 1992; Biederman et al., 1999). Other authors are clearly in favor of a comprehensive model of object recognition built on both theories (Foster & Gilson, 2002; Hayward, 2003).

But how these multiple representational systems could be implemented at the brain level?

This question leads to a broader one: What are the neural correlates of visual recognition and learning? In the next section, first, the functional organization of the visual system and of the inferior temporal lobe in particular will be briefly addressed. Then, I will review neurophysiological and neuroimaging studies assessing the role of the inferior temporal lobe in object recognition, and finally, I will present evidences of learning-related changes in this brain area.

1.1.2 Neural correlates of visual recognition and learning Two visual pathways

Humans are often described as visual animals and this becomes obvious when one looks at the functional distribution of cortical areas. Felleman and Van Essen (1991) first studied this distribution in the Macaque cortex and found that about half (52%) of the cortical surface area was predominantly or exclusively visual. More recently, Van Essen and colleagues made interspecies comparisons between the Macaque and Human cortices and found high anatomical and functional overlap in visual areas (Van Essen et al., 2001).

Ungerleider and Mishkin (1982) also studied the Macaque visual system and observed that lesions of inferior temporal cortex caused impairments in visual discrimination tasks, whereas posterior parietal lesions caused impairments in visuospatial tasks. They proposed that the visual areas fell into two cortical visual pathways, each originating in the primary visual cortex as schematized in Figure 4. The dorsal stream projects into the inferior parietal area and plays a crucial role in the appreciation of the spatial relationships among objects and

(20)

visual guidance toward them whereas the ventral stream projects into the inferior temporal cortex and is involved form recognition and object representation.

V4 V1 MT V2

IT

V4 V1 MT V2

IT

Figure 4. Lateral view of the left hemisphere of a Rhesus monkey. The grey area defines the cortical visual surface in the occipital, temporal and parietal lobes. Arrows schematize two cortical visual pathways, each beginning in primary visual cortex (area OC–V1), diverging within prestriate cortex (areas OB–V2), and then coursing either ventrally into OA-V4 and the inferior temporal cortex (areas TEO and TE; IT) or dorsally into OA-MT and the inferior parietal cortex (area PG). The ventral stream is crucial for object vision and the dorsal one for spatial vision. Source: Adapted from Mishkin et al., 1983.

In humans, neuropsychological studies revealed a similar functional organization of the visual areas. Patients with brain damage in the dorsal pathway are generally unable to reach objects accurately but can still name and describe them (‘optic ataxia’). On the contrary, patients with a lesion in the ventral pathway (‘visual form agnosia’) are unable to name what they see but can easily grasp or pick up an object (Farah, 1990; Goodale et al., 1994).

A simple way to refer to these two streams is the what and the where pathways. However, more recently, it has been proposed that the ventral stream constructs our conscious perception in interaction with the memory systems and that the dorsal stream transforms, in a more ‘bottom-up’ way, visual information in order to guide our actions (Milner & Goodale, 1995). Numerous dorsal-ventral interactions might also exist (Pisella et al., 2009). Even if the dorsal stream might be involved in visual information processing, this work focuses on the ventral stream and its implication visual representations formation and consolidation.

Functional organization of the ventral stream

Early neurophysiological studies in monkeys found neurons that were preferentially activated in the temporal lobe for complex visual stimuli such as faces, and even neurons that

(21)

responded differently to different faces (Desimone et al., 1984; Perrett et al., 1984). More recently, researchers (Tsao et al., 2006) recorded single neurons in the macaque,in a region of the superior temporal sulcus and found 97% were face-responsive cells. In humans, a similar region has been found, the so-called fusiform face area (FFA, Kanwisher et al., 1997), which will be presented in further details in section 1.2.3. Thus it appears that there is some specialization of function of different temporal cortical visual areas (for a review, see Grill- Spector & Malach, 2004). Category-selective regions for faces, places, body-parts and a category-general region for objects have been identified in both monkeys and humans (for human and monkey comparison, see Bell et al., 2009; for selective areas in humans, see Epstein & Kanwisher, 1998; Downing et al., 2001). A category-general region for objects, the lateral occipital cortex (LOC) seems to play an important role in 2-D-shape and interestingly also in 3-D-shape representations (Kourtzi et al., 2003; Kourtzi & Kanwisher, 2001).

However, the functional organization of all these cortical regions remains controversial. Some authors argue for a more distributed organization with intermingled cortical regions selective for particular categories (Haxby et al., 2001).

This work focuses on the representations of objects and faces. Thus, the regions which received more interest were the LOC for object learning and the FFA for faces.

The role of the inferior temporal lobe in object recognition

The inferior temporal (IT) cortex is the final stage of the so-called what pathway and has been hypothesized to play a pivotal role in various aspects of visual object recognition. It has been extensively investigated in order to find how features are extracted and how object are represented (for a review, see Gross, 2008). Below will be described some advances provided by neurophysiological and neuroimaging studies in the understanding of visual object recognition.

In the attempt to understand the role of the inferior temporal cortex in object recognition, two of the first goals were to identify the level of complexity of the features extracted in this region and to know whether objects were coded in local (‘grandmother cell’ – cells that only respond to a specific visual concept, such as your own grandmother - see Barlow, 1972) or distributed representations. The first evidence of sensitivity for complex stimuli came from Gross and colleagues (Gross et al., 1969) who showed that IT cortex neurons of monkeys were more activated by complex stimuli such as hands and faces than simple stimuli. These early single-units studies together with Perrett et al.’s one (Perrett et al., 1984) suggested that the “grandmother cell” might exist. Tanaka and colleagues (for review see Tanaka, 2003) further investigated how object features are represented in IT. They recorded TE cells activity

(22)

and identified objects that lead to the maximum of activity. Then they simplified these objects to the minimal image features maintaining the same firing rate as measured for the whole complex object. The features leading to the same activation as the object tended to be moderately complex (Tanaka et al., 1991). Tanaka and colleagues also found that adjacent cells were activated by similar simplified features (‘minicolumns) (Fujita et al., 1992).

Finally, they observed that objects were coded by the combined activation of multiple cells, each of which representing different features of the object image (Tanaka et al., 1991). This result accounted for a distributed rather than local distribution of object representation.

The degree to which visual representations in the inferior temporal cortex are view- independent or view-independent, and how these representations allow invariant recognition has been addressed by several single-unit cells recordings studies in monkeys but also by a few neuroimaging studies in humans. Recordings in IT cells provide evidence of representations that tolerate changes in object (or face)’s visual input (e.g., viewing angle, Hasselmo et al., 1989; size, Lueschow et al., 1994). Other neurophysiological studies found that these cells were sensitive to viewpoint or size (e.g., Perrett et al., 1982; Ashbridge et al., 2000) and other studies showed cells with both view-specific and view-invariant responses (Perrett et al., 1991; Ito et al., 1995).

More recently, some neuroimaging studies showed both object-based and view-based activations and argued for the existence of multiple mechanisms. Sawamura et al. (2005) used a fMRI adaptation paradigm in monkeys (IT) and humans (Lateral Occipital Cortex, LOC) and found the greatest decrease in neural responses for repetitions of the same object at the same size, intermediate levels of responses for repetitions of the same object at different sizes, and lowest response levels for repetition of different objects. They could not find complete invariance in any of the shape-selective regions. However, some degree of size invariance was found, with a tendency in the anterior regions to be more invariant than posterior ones and with the left LOC being more size invariant than the right one. This is concordant with other neuroimaging showing more abstract and invariant representation in the left fusiform gyrus (FG) than in the right one (Koutstaal et al., 2001; Vuilleumier, Henson et al., 2002; Garoff et al., 2005). A recent study also showed both view-dependent and view-independent adaptation effects in the FG, with a view-dependence in the right FFA and view-independence in the medial FG, during repeated presentations of faces (Pourtois et al., 2009). View-independence was also found in the right medial FG for other object categories (chairs, houses).

Thus, the functional architecture of the inferior temporal lobe is complex and seems to allow the formation of both view-dependent and view-invariant representations. However, the nature of these representations, how they build and which processes underlie 3-D object recognition in human are not clearly understood yet.

(23)

Experience-dependent changes in brain responses underlying visual learning

Logothetis and colleagues investigated in monkeys how experience could shape behavior and neural representations in IT for novel 3-D objects similar to those used in psychophysical studies in humans (Bulthoff & Edelman, 1992). They found that in early learning stages, behavioral performance was dependent on the view. Monkeys were able to recognize the learned views and to generalize, with decreased performance, to rotated views, but not farther than 40° around the learned views (Logothetis et al., 1994). These results were in agreement with psychophysical studies in humans suggesting the existence of view-based representations and interpolation processes between learning views (Bulthoff & Edelman, 1992). Logothetis and Pauls (1995) hypothesized that extensive learning with a big set of views would lead to invariance in recognition. Thus they extensively trained monkeys with 5 objects during 4 to 6 months. Behavioral performance became view-invariant but they still found multiple cells tuned for different views. Only a very few number of cells showed invariance. Again these results supported the view-dependent account of object-recognition. However, Booth and Rolls (1998) showed that extensive training is not necessary and that ‘natural exposure’ was sufficient to form invariant representations. Indeed, they put real objects in the monkeys’ cage for several weeks and observed that after natural inspection of sometimes only a few seconds long, view-specific and view-invariant responses could be recorded in STS. Thus, it is not clear yet whether view-dependent and view-invariant representations might coexist in the long term, or whether view-dependency is only the first step of learning.

In humans, psychophysical studies demonstrated that structural (view-invariant) representations may derive from view-specific representations, provided that sufficient prior knowledge and input information are available (Gschwind et al., 2007; Rentschler et al., 2008). However, this issue has never been tested using neuroimaging techniques.

Overall, these data confirm that the inferior temporal lobe play a crucial role in visual recognition and in 3-D object learning. Both view-dependent and view-invariant representations might coexist. However, it is not clear yet how these representations are built and how they evolve through learning. Thus, further investigation is needed to understand how knowledge acquired through a set of defined views can be generalized to novel views for newly learned objects.

(24)

1.2. Learning to recognize faces

Faces may be considered as a particular kind of objects. They convey important emotional and social information (Norris et al., 2004), and their perception and recognition are crucial for adequate interactions with our environment. Moreover, faces are very homogeneous stimuli, and imply the processing of both features and configuration. Thus, faces are of particular relevance in the study of processes underlying visual learning. In the following section, I will first review the specificity of the face stimuli and models accounting for their representation and recognition. Then, I will present the neural correlates of face recognition and learning, focusing on the role of the fusiform gyrus.

1.2.1. Models of face recognition

A double challenge for the visual system: selectivity and invariance

When a face is encountered for the first time, it is almost instantly encoded as a visual object belonging to the face category as well as to a given gender, age range, attractiveness level, etc. A known face can then be easily recognized, even in a crowd, and sometimes years after its first encounter. Thus the human face-recognition abilities exemplify the exceptional property of the visual system to effectively code for hundreds of individual visual items and allow fast and reliable performances at the subordinate-level 1(Tanaka, 2001).

Learning to recognize new faces is challenging for our visual system in many ways. First, since faces are perceptually very similar to each other (e.g., same constituting elements, fixed geometrical arrangement of these elements), exquisite discriminative abilities are required to distinguish between faces from different individuals (Gauthier et al., 1997). Second, known faces can be recognized across a variety of visual transformations (e.g., viewpoint, luminance, size but also facial expressions, age-related changes or hairstyle), which implies the extraction of invariant features within individual faces (e.g., Moses et al., 1996; Pourtois et al., 2005a).

Thus, the face-recognition system has to be highly selective but also largely tolerant to variations in visual input. Studying the human face-recognition system is therefore of particular relevance to understanding the processing of individual memory traces (i.e.,

1 Subordinate-level: recognition of visual objects has been classified within three levels of categorization, the basic-level (e.g., dolphin), the superordinatate one (e.g., animal) and the subordinate one (e.g., Flipper), with the basic-level being the most commonly used one, with the shortest reaction times (Rosch et al., 1976). Faces are usually recognized at the subordinate-level, and face recognition at the subordinate-level is as fast as at the basic-

(25)

subordinate-level processing; Biederman et al., 1999; Sigala et al., 2002) and recognition mechanisms used for other objects.

Features vs. configuration

All faces are composed of a common set of features (eyes, nose, and mouth), which are arranged similarly (the nose in the middle, the two eyes above and the mouth below). The features can vary slightly in shape, color and size and give discriminative cues, but individuation of faces relies mostly on the configural arrangement of features (e.g., the spacing between the eyes), as shown in the following experiments.

The well-known ‘inversion effect’, first shown by Yin (1969) who found that rotating a image upside-down affected face recognition much more than recognition of other categories of objects lead to the hypothesis that holistic/configural processing was critical for face recognition. Many other studies (for reviews, see Valentine, 1988 and Maurer et al., 2002) have provided evidence for the specific disruption of holistic/configural processing of inverted faces.

Two-other paradigms have been used to investigate configural processing: the ‘composite effect’ (Young et al., 1987) and the ‘part-whole effect’ (Tanaka & Farah, 1993). The

‘composite effect’ consists in the alignment of top and bottom halves of two different individuals that gives the illusion of seeing a totally new face. The alignment seems to interfere with the recognition of the parts, but only when the face is presented in the upright position (Figure 5). Non-aligned halves can be equally identified in upright and inverted positions (Young et al., 1987). This effect is a strong claim in favor of a configural processing of faces. The ‘part-whole effect’ illustrates the higher recognition performance for face parts (e.g., the nose) presented within the whole face rather than isolated. This is not the case for other types of stimuli such as scrambled faces, inverted faces or houses (Tanaka & Farah, 1993) and again represents an evidence of the special processing of faces (see also McKone, 2004).

Neuropsychological evidences also speak in favor of separate systems for object and face processing. Two complementary deficits, prosopagnosia (impaired recognition of faces) and visual object agnosia (impaired recognition of common objects) together, form a classic neuropsychological double dissociation (Farah, 1990; Moscovitch et al., 1997; Rivest et al., 2009). Farah (1990) argued that two independent recognition systems exist, one that is part- based and one that is holistic/configural, the former mostly implicated in object recognition and the latter in face recognition.

(26)

These results seem to reflect a special processing of faces, relying on configuration rather than features only, even if “the exact nature of this style of computation is not understood”

(McKone, 2009, p. 268). Other authors clearly claim that this special processing is related to our expertise with faces and that similar results could be obtained with highly homogeneous other objects of expertise (inversion effect with dog experts - Diamond & Carey, 1986;

Valentine, 1988). Moreover, increasing configural sensitivity has been shown in developmental research (for a review, see Friere & Lee, 2004). Based on their computational approach, Dailey and Cottrell (Dailey & Cottrell, 1999) proposed that face processing could rely on a domain-specific mechanism, arising naturally through infants’ development, without being innately present, only as a consequence of the ecological particular subordinate-level needs of faces (i.e., recognition at the individual level, Biederman et al., 1999). In a review on holistic processing of faces, Mc Kone et al. (2007) proposed a combined developmental and evolutionary explanation to this specificity.

Figure 5. Illustration of the “face-composite illusion.” Identical top halves of two face stimuli tend to be perceived as being different when they are aligned with different bottom halves. A. This visual illusion disappears when the two face halves are laterally offset, or B. when the faces are presented upside down (C). Source: Michel et al., 2007; Young et al., 1987.

(27)

Prototype face representations

Our remarkable ability to discriminate and recognize hundreds of faces has been attributed to the encoding of face exemplars relative to a face prototype (Valentine, 1991; Leopold et al., 2001). Faces differ from each other in multiple dimensions (e.g., nose size, distance between the eyes). These dimensions are coded as vectors of a multidimensional space and individual faces are represented as single points within this space. The average face is the

‘superimposition’ of all faces encountered in one’s life. Faces are discriminated by their distance from the average and/or by distance from each other. Thus faces that are close to each other in this space are quite similar. The closest the face is from the average, the most

‘typical’ it is. This has been used to explain why very distinct faces are easier to recognize (Valentine, 1991), and also as a basis for the other-race effect2 (e.g., Byatt & Rhodes, 2004).

Lofler et al. (Loffler et al., 2005) provides evidence for neural populations tuning3 to faces along ‘identity’ axes within this space. It has been suggested that perceptual learning leads to an increase of the distance across faces allowing a faster and more accurate identification (Hussain et al., 2009).

If the average face is the mean of all our faces’ experiences, each face is also the result of a

‘superimposition’. Burton and colleagues proposed that the visual system is able to create a prototype-face from all the exemplars we encounter for one individual face and to implement it in the face space (Burton et al., 2005).

.

Unfamiliar vs. familiar face representations

The computational proposition of a prototype-face for each individual person we know suggests that this average face would be created after many encounters with one face and underlies its recognition across different visual conditions (e.g. viewpoint, contrast).

However, the recognition of a face identity which has been encountered only once is also possible (for a review, see Hancock et al., 2000). Are there different representations of unfamiliar faces different from representations of familiar ones?

Some studies found that repetition priming (perceptual facilitation illustrated by decrease reaction time or brain activity) is enhanced by the familiarity of the face-stimuli (Bruce &

Valentine, 1985; R. Henson et al., 2000; Boehm et al., 2006). Bruce (1982) investigated the

2 Other–race effect: bias in recognition leading to better performance for own-race faces compared with other- race faces (for a review, see Meissner & Brigham, 2001 )

3 Tuning: selectively responding

(28)

effect of modifying viewpoint or expression on the recognition of familiar and unfamiliar faces, and found that unfamiliar faces were more affected by these changes than familiar ones.

Other studies showed that familiar faces can tolerate transformations (e.g., viewing angle, Goshen-Gottstein & Ganel, 2000; lighting, Hill & Bruce, 1996). Thus it seems that representations of unfamiliar faces are primarily picture-based and then rely on a higher-level of processing (semantic) as faces become familiar (Burton et al., 1999; Pourtois et al., 2005b).

It has also been shown that there might be a shift from external (hairstyle, face shape) towards internal features in the processing of unfamiliar/familiar faces (Ellis et al., 1979; Young et al., 1985; Goshen-Gottstein & Ganel, 2000). O’Donnell and Bruce (2001) suggested that the eyes rather than internal features in general become more important as a face becomes familiar.

Thus, it appears that representations for familiar faces depend less on ‘pictorial’ and also unstable cues than unfamiliar faces. They are more abstract and this allows generalization in new visual context. However, it still remains unclear how this abstract representation is created.

Cognitive models of face recognition

Some influential models of face recognition (e.g., Bruce & Young, 1986; Burton et al., 1990;

Ellis & Lewis, 2001; Breen, 2000) tried to explain how this abstract representation is built and proposed that information about a face is processed across hierarchically-organized units including (1) face recognition units (FRUs) underlying the recognition of a familiar face based on the structural encoding of the face, (2) multimodal person identity nodes (PINs) linking information from different domains about the same person, (3) semantic information units (SIUs) where attributes about the person are stored, and (4) an affective module mediating affective responses to familiar stimuli independently of overt identity recognition (Figure 6).

Strengthening of the link between an individual FRU and its corresponding PIN after repeated exposure to a given face would lead to more a efficient processing of the face (Burton et al., 1990). This hypothesis accounts for both increased explicit recognition as well as implicit effects, such as faster responses (Ellis et al., 1990; Paller et al., 1992) and increased positive affective response for repeated stimuli (mere-exposure effect; Zajonc, 1968; Butler & Berry, 2004). The latter effect might also rely on a strengthening of the link between the FRU and the affective response module. A new FRU may be created for each newly encountered face (Burton, 1994).

(29)

Structural encoding

Face Recognition Units

Person Identity Nodes

Affective Response to Familiar Stimuli

Semantic Information Units

Integrative device

Attribution processes

Skin conductance

response

Name retrieval

Visual input

Structural encoding

Face Recognition Units

Person Identity Nodes

Affective Response to Familiar Stimuli

Semantic Information Units

Integrative device

Attribution processes

Skin conductance

response

Name retrieval

Visual input

Figure 6: A model for face processing showing two independent routes from the Face Recognition Units (FRU): one leading to the Person Identity Nodes (PIN) and then to name retrieval processes; and a second route leading to the Affective Response to Familiar Stimuli (modulating the skin conductance response). The PINs and the affective response module project to an integrative device which compares the expected affective response with the actual affective response to allow attribution processes to take place. Source: Adapted from Ellis & Lewis, 2001 and Breen, 2000.

Recent models, built on neuropsychological findings, propose that these components of the face recognition system might be under the control of an executive module, preventing from doing false recognitions (Rapcsak, 2003, Bartlett et al., 2009). Further details on these models and on how they could be implemented in the brain will be provided in the following section, which presents the neural correlates of face recognition and learning.

1.2.2. Neural correlates of face recognition and learning Distributed neural network recruited in face processing

As mentioned in section 1.1.2. of this introduction, cell recordings in monkeys showed that face processing involved dedicated brain circuits, including the superior and inferior temporal cortices (e.g., Perrett et al., 1982). Neuropsychological reports (e.g., Farah, 1996; Vuilleumier et al., 2003) and brain imaging studies in humans also highlighted specific neural correlates for face processing (e.g., Puce et al., 1995; Haxby et al., 2001). Kanwisher et al. (1997) identified a region in the fusiform gyrus which was activated while subjects viewed rapidly presented sequences of faces versus sequences of inanimate objects. They named this region the ‘fusiform face area’ (FFA).

Références

Documents relatifs

Dendrogram of matrix-assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectra of thoraxes from (4-) 5 individuals of 13 species of field-caught Culicoides

We were able to demon- strate that increasing disease stage was correlated with a pre- dominantly ipsilateral dopaminergic depletion particularly in the head of the caudate

ةصلالخا : ةيلوؤسلدا ةيهمبأ يعولا ةدياز ،ةينوناقلا ةيلودلا تاطوغضلا تدأ ةيعامتجلاا تايجولونكتلا رود مظاعتو ةميقلا ةلسلس،اهقيبطتو دديح ةيئيبلا

Ma mère nous expliqua que mon père venait d’avoir un accrochage, un accrochage pour moi cela signifiait -comme dans mon jeu de société- un simple tête à queue,

شرعلا سونايتيلكويد د روطاربملإا تاحلاصإ ناونعب ناكف لولأا لصفلا (سونايتيلكوي 284 - 305 نمضتي ثيح )م دب ،رصانعلا نم ةلمج و ةيصخشلا هتاهب فيرعتلاب

Pierrick Thoraval, Anne-Marie Chaussé, Marielle Afanassieff, Rima Zoorob, Daniel Bouret, Gaelle Luneau, Jean-Marie Alletru, Evelyne Esnault, Denis. Soubieux,

38 The optical properties of L2 are described by an effective dielectric function calculated from the MG and the BEMA theories or an effective dielectric tensor calculated from

To assess the quality of our library preparation, two read- outs were followed: the periodicity between 5′P reads and their accumulation around stop codon (Figure 4).. In fact,