HAL Id: hal-02008916
https://hal-amu.archives-ouvertes.fr/hal-02008916
Submitted on 6 Feb 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Distributed under a Creative Commons Attribution| 4.0 International License
A unified coding strategy for processing faces and voices
Galit Yovel, Pascal Belin
To cite this version:
Galit Yovel, Pascal Belin. A unified coding strategy for processing faces and voices. Trends in
Cognitive Sciences, Elsevier, 2013, 17 (6), pp.263-271. �10.1016/j.tics.2013.04.004�. �hal-02008916�
A
unified
coding
strategy
for
processing
faces
and
voices
Galit
Yovel
1and
Pascal
Belin
2,3,41SchoolofPsychologicalSciencesandSagolSchoolofNeuroscience,TelAvivUniversity,TelAviv,Israel 2InstituteofNeuroscienceandPsychology,UniversityofGlasgow,Glasgow,UK
3
De´partementdePsychologie,Universite´deMontre´al,Montre´al,Canada
4InstitutdesNeurosciencesdeLaTimone,UMR7289,CNRSandUniversite´Aix-Marseille,France
Bothfacesandvoicesarerichinsocially-relevant
infor-mation,whichhumansareremarkablyadeptat
extract-ing,includingaperson’sidentity,age,gender,affective
state,personality,etc.Here,wereviewaccumulating
evi-dencefrombehavioral,neuropsychological,
electrophys-iological,andneuroimagingstudieswhichsuggestthat
the cognitive and neural processing mechanisms
en-gagedby perceiving faces or voices are highlysimilar,
despitetheverydifferentnatureof theirsensoryinput.
Thesimilaritybetweenthetwomechanismslikely
facil-itates the multi-modal integration of facial and vocal
information during everyday social interactions. These
findingsemphasizeaparsimoniousprincipleofcerebral
organization,where similar computationalproblems in
differentmodalitiesaresolvedusingsimilarsolutions.
Similarcognitiveandneuralrepresentationsforfaces
andvoices
Facesandvoicesarethemostsociallyimportantstimuliin
thevisualandauditorydomains,respectively.Thenature
ofthesensoryinputassociated withthesekey social
sti-muliisverydifferent:reflectionsoflightonthefacevsair
pressure waves generated by the vocal apparatus. Yet,
theybothconveyvery similartypesofinformationabout
aperson,includingidentity,gender,emotionalstate,and
age.Furthermore,inmanycasesofsocialcommunication
faces andvoices are processed simultaneously andhave
beenshown tohave facilitatory effectson recognition of
person information relative to when each is presented
alone(forareview,see[1];Box1).Itisthereforeplausible
that,despitetheirverydifferentsensoryinput,theymay
generate,atleasttosomeextent,asimilarrepresentation.
Indeed,recent studiesreveal manysimilarities between
theirneuralandcognitiverepresentations.
In this review, we highlight the many similarities
that havebeen foundbetweenthe neural and cognitive
mechanismsoffaceandvoiceprocessinginthepastfew
years. We will summarize evidence pertaining to the
followingfiveareas:neurophysiologicalmechanisms;
neu-rocognitivedisorders;functionalarchitecture;perceptual
coding; and development and experience (Table 1; see
Glossary).Becausefaceshavebeenstudiedmoreextensively
thanvoices,wewillalsohighlightseveralwell-established
phenomenathathavebeenreportedforfacesandshould
be investigated in future studies with voices tofurther
exploretheirunifiedcodingstrategy.
Themanysimilaritiesthatexistbetweentheneuraland
cognitive representation of faces and voices suggest a
unifyingcodingmechanismthathasevolvedtorepresent
the very rich and diverse informationthat these unique
classes of visual and auditory stimuli convey about a
person. More generally, these findings suggest that the
brainmayemploysimilarprinciplesforprocessingstimuli
that convey similar typesofinformation notonlywithin
thesamemodality,butalsoacrossdifferentmodalities.
Glossary
Adaptation: prolonged exposure to a stimulus modulates neural and behavioralresponsestothestimulus.
Aftereffects:prolongedexposuretoastimulusdistortsperceptionofanew stimulusthatfollowsitintime,inadirectionoppositetothefirststimulus. Faceinversioneffect:declineinrecognitionforupside-downrelativetoupright stimuliislargerforfacesthananyothernon-faceobjects.
Fusiformfacearea(FFA):aface-selectiveareathatisfoundinthehuman fusiformgyrus.Otheroccipito-temporalface-selectiveareasarefoundinthe lateraloccipitalandsuperiortemporalsulcus.
Otherraceeffect:recognitionoffacesofotherraces(e.g.,Caucasianfacesby Asianobservers)ispoorrelativetoownracefaces(e.g.,AsianfacesbyAsian observers).
Retinotopy:thevisualfieldismappedtopographicallyontoneuronsinthe retina and earlyvisual cortexsuchthat adjacentneurons encode nearby regionsinthevisualfield.
Temporalvoiceareas(TVA):voice-selectiveareasfoundinthemiddleand anteriorpartsofthehumansuperiortemporalsulcus/gyrusbilaterally,witha right-hemisphericasymmetry.
Tonotopy:soundsofdifferentfrequencyareorganizedtopographicallyinthe ascendingauditory pathways and earlyauditorycortex, suchthat nearby neuronsaretunedtonearbyfrequencies.
Perceptualnarrowing:discriminationabilitiesamongsensorystimuliinyoung infants(three–sixmonthsold)arereducedinolderinfants(ninemonthsold) forstimuliforwhichtheyhavenoperceptualexperiencewith(e.g.,monkey faces,monkeyface–voiceintegration).
Phonoagnosia:selectivedeficitinvoicerecognition.Mostcasesreportedhave beenduetobraindamageintherighttemporo-parietaljunction.Asinglecase ofdevelopmentalphonagnosia(i.e.,withnoapparentbraindamage)hasbeen described.
Prosopagnosia:selectivedeficit infacerecognition,whichappearsintwo forms,acquiredand developmental. Acquiredprosopagnosiaresultsfrom braindamage,usuallyintherighttemporallobe.Developmentalcasessuffer fromalife-longdeficitwithnoapparentbraindamage.
Voiceinversion:voicestimulicanbereversedintimeorinverted(‘orrotated’) infrequency,inducingrecognitiondeficits.
1364-6613/$–seefrontmatter
ß2013ElsevierLtd.Allrightsreserved.http://dx.doi.org/10.1016/j.tics.2013.04.004 Correspondingauthors:Yovel,G. (gality@post.tau.ac.il);
Belin,P. (p.belin@psy.gla.ac.uk)
Keywords:facerecognition; voice recognition;neural selectivity; sensorycoding; visualcortex;auditorycortex.
Neurophysiologicalmechanisms
Faces and voices have both been shown to elicit highly
selectiveneuralresponsesinthehumanbrain(Figure1A–C).
Faceshavebeen typicallycompared tonon-face objects,
suchashousesorchairs.Voicesareusuallycomparedto
differentcategoriesofnon-vocalsounds,suchas
environ-mental or mechanical sounds. Functional MRI (fMRI)
studies reveal much stronger responses to faces than
anyothernon-facestimuliinatleastthreeoccipital
tem-poral areas: the occipital face area (OFA) in the lateral
occipital cortex,the fusiformface area(FFA)inthe mid
fusiformgyrus,andafacearea inthe posteriorsuperior
temporalsulcus(STS–FA)[2,3](Figure1A,left).Recent
studiesalsorevealmoreanteriorface-selectiveresponses
in the anterior temporal lobe andthe prefrontal cortex
[4]. Voice-selective cortical mechanisms do also exist:
fMRI studies have identified several regions along the
middle and anterior STS and superior temporal gyrus
(STG) that show a greater response to vocal sounds
(regardless of whether they carry intelligible speech or
not[5])thantonon-vocalsounds[6–8]:theseareaswere
named the ‘temporal voice areas’ (TVA) (Figure 1A,
right).Voice-sensitiveresponseshavealsobeenobserved
inotherareas,includingtheinsulaandprefrontalcortex
[9–11].
Consistentwithneuroimagingfindings,
electroenceph-alography (EEG) and magneto-encephalography (MEG)
studies show face- andvoice-selective evoked responses.
Faces elicit a component of much larger in amplitude
than non-facestimuli 170ms after stimulus onset– the
face-selectiveN170/M170[12,13](Figure1B,left).A
voice-selectiveelectrophysiologicalcomponentatalatency
com-parable tothat ofthe N170,termedthe‘fronto-temporal
positivitytovoice’(FTPV),hasbeenalsorecentlyreported
inEEG [14–16](Figure1B,right)andMEG [17]studies
approximately200msaftersoundonset.Finally,
transcra-nial magnetic stimulation (TMS) of fMRI-defined
face-selective areas indicates a causal and specific role for
theoccipital facearea infacediscrimination(Figure 1C,
left) and in the generation of the face-selective N170
response[18,19].Similarly,TMSoverthe TVAhasbeen
showntodisruptvoicedetection[20](Figure1C,right).
Finally,oneprominentandwell-establishedfeatureof
the face-processing mechanism is its right hemisphere
asymmetry, which has been manifested both in neural
andbehavioralmeasures[21,22].Whereasspeech
proces-singislateralizedtothelefthemisphere,voicerecognition,
similar to faces, elicits neural responses that are right
lateralized[21].
Face- and voice-selective neural responses are not
limitedtothehumanbrain,buthavealsobeenobserved
inthemacaquebrain.Faceneuronsarecommonlyfoundin
thesuperiortemporalsulcusandtheinferotemporalcortex
[23]. Furthermore, functional MRI studies reveal a
net-work of face-selective areas primarily in the upper and
lowerbanksofthesuperiortemporalsulcus[4]thatshare
atleastsomeanatomicalandfunctionalsimilaritieswith
the human face areas [24] (Figure 1D, left). Similarly,
monkey fMRI studies revealed voice-selective areas [25]
inthesuperiortemporalplanethatpreferspecies-specific
vocalizations over other vocalizations and sounds
(Figure1D,right).Thesevoice-selectiveareas havebeen
shown tocontain voice-selectiveneurons [26]. The
pres-enceofface-andvoice-dedicated mechanismsinthe
ma-caquebrainindicatesthatthesefaceandvoiceareas did
notjustemergerecentlyinhumansalongwiththe
emer-genceoflanguageandhigh-levelsocialfunctioningskills:
they were probably already present in the last common
ancestorofmacaquesandhumanssome30millionyears
ago. This highlights the importance of these stimuli for
basicsocialfunctioningthroughoutprimateevolution.
In summary, neurophysiological and neuroimaging
findingsconvincinglyshowthatbothfacesandvoiceselicit
ahighlyselectiveneuralresponse.Thishighlightsnotonly
theirsocialimportance,butalsothefactthattheunique
natureoftheirrepresentationrequiresmechanismsthat
aredifferentfromthoseusedfortheprocessingofanyother
visual andauditory stimuli. Moreover, this similarityin
theirneuralrepresentationsisconsistentwithother
simi-larprinciplesusedfortheprocessingofauditoryandvisual
stimuli,suchasthetonotopicandretinotopic
representa-tionsinprimaryauditoryandvisualcortex,respectively,or
theseparate mechanismsfor‘where’and‘what’
informa-tion that have been reported both in visual [27] and
auditory[28]systems.
Neurocognitivedisorders
Consistentwith the strong neuralselectivity thatis
dis-cussedaboveforfacesandvoices,neuropsychological
stud-ies have reported selective impairments in face or voice
recognition,inthefaceofotherwiseintactvisualor
audi-tory functions,respectively.Selectivedeficits inface
rec-ognitionabilities (i.e.,prosopagnosia)were reported over
Box1.Face–voiceintegrationforpersoninformation
Face-voiceintegrationhasbeenprimarilystudiedinthecontextof
speech processing.However, giventhatfacesand voicesconvey
important and similar non-speech information about person
identity,itisalsoimportanttoexamineface–voiceintegrationfor
theprocessingofidentity,emotion,age,andgender.Recentstudies
haveshownthatface–voiceintegrationcontributessignificantlyto
theextractionofpersoninformation[77].Specifically,cross-modal
interactionintheprocessing offaceandvoiceidentityhasbeen
showninstudiesthatpresentedcongruentandincongruentidentity
[78].Face–voiceintegrationforgenderhasbeenshownevenwith
puretonesextractedfrommaleandfemalevoices,whichwerenot
recognizedby participantsas maleor femalevoices.Thesepure
tones biasedperception ofan androgynous face toa maleor a
femalefaceaccording tothegenderofthetone[79].Integration
effects between faces and voices have also been observed for
emotionalinformation[80–82].
Face–voiceintegrationappearsveryearlyinlife.Severalstudies
haveshownthatattwomonthsofage,infantsbegintoexhibitthe
ability toperceiveface–voice correspondences [83].Interestingly,
perceptualnarrowing,whichhasbeenshownforfacesandspeech
(seemaintext),hasbeenreportedalsoforface–voiceintegration.
For example, four–six- and eight–ten-month-old infants were
presented with consistent and inconsistent face–voice stimuli of
monkeys and humans. Whereas the four–six–month-old infants
wereabletomatchface-voicestimuliofbothhumansandmonkeys,
eight–ten-month-old infants were able to match human but not
monkeyface–voicestimuli[84,85].Thesimilardevelopmentaltrack
thatisfoundforfacesandvoicespresentedinisolation,aswellas
fortheintegrationofthetwostimuli,isinlinewiththeideathat
similarcodingmechanismsofunisensoryinformationmayunderlie
successfulmultisensoryintegration.
Review
TrendsinCognitiveSciences June2013,Vol.17,No.650yearsagoinbrain-damagedpatientsfollowingalesion
in the occipital temporal cortex, usually over the right
hemisphere[29].Morerecently,similardeficitswerefound
inindividualsthatshownospecificbrainlesion,butsuffer
from life-long prosopagnosia, known as developmental/
congenital prosopagnosia[30].Prosopagnosicindividuals
seem to show intact recognition of objects, but exhibit
severedifficulties in recognizing familiar facesincluding
their close relatives and friends. Regarding voices, the
existenceofpatientswithselectiveimpairmentsinspeech
comprehension has long been established (e.g.,
Wer-nicke’saphasia).Moresimilartoprosopagnosia,asmall
number of ‘phonagnosic’ patients have been identified
withimpairments in speakerdiscrimination or
recogni-tion, even though other aspects of auditory perception
were normal [31–33]. Only one case of ‘developmental
phonagnosia’–theselectiveinabilitytorecognize
speak-ersbytheir voiceintheabsenceofanyevidentcerebral
impairment–hasbeenreportedsofar[34].Itispossible
that the lack of additional developmental phonagnosia
casesmay not reflect anabsence ofsuch cases, but the
inability of individuals who suffer from this deficit to
acknowledgetheir deficit,aswasthecasewith
develop-mental prosopagnosia for many years. Furthermore, a
lackofstandardizedtestsforphonagnosiaalsoimpedes
itsreliablediagnosis.
Functionalarchitecture
Asmentionedabovebothfacesandvoicesconveysimilar
information abouta person, includinggender, emotional
state, identity, and age. The idea that the functional
architecture underlying face and voice processing could
be organized following comparable principles has been
discussedbeforeandthereforewillonlybrieflymentioned
here [1,35]. A neurocognitive architecture described by
Bruce andYoung [36] has been suggested to also apply
to voices [1]: briefly, after a stage of cortical processing
commontoallstimulioftheirparticularsensorymodality,
faces and voices are selectively processed in a further
‘structuralencoding’stage,probablyrepresentedbyareas
such as the FFA and TVA, respectively. Then, in each
modality, thethree maintypesofinformationcarriedby
bothfaces andvoices–identity,affect,speech–are
pro-cessed along functional pathways which, although they
interact with one another during normal functioning,
canbeselectivelyactivated/impaired.
Perceptualcoding
Oneofthemostinfluentialmodelsoffaceprocessingisthe
‘facespacemodel’[37],whichpositsthatfaceidentitycan
berepresentedas locationsinamultidimensionalspace.
The dimensions of this space correspond to information
used to discriminate faces, whereas the distance that
Table1.Facevoicesimilarities
Face Voice
Neuralselectivity
Human
Electrophysiology N170/M170[13] FTPV[14,17]
FunctionalMRI Faceareasinthelateraloccipital,midfusiformand
STS[2,86]
VoiceareasintheSTS[87]
Hemisphericasymmetry Righthemisphere[21,22] Righthemispherevoice-selectivity[21]
(lefthemisphereforspeech)
EffectsofTMS TMSovertheOFAselectivelyimpairsperformance
forfaces[18]andselectivelyincreasestheface
N170[19]
TMSovertheTVAdisruptsvoicedetection[20]
Monkey
Electrophysiology Face-selectivecells[23] Voice-selectivecells[26]
FunctionalMRI Face-selectivebrainareas[4,23] Voice-selectivebrainareas[25]
Selectiverecognitiondeficits
Developmentalandacquiredprosopagnosia[29,30] Developmentalandacquiredphonagnosia[31,34]
PerceptualCoding
Norm-basedcoding(Box2) Relativetoanaveragedface[40,42,88] Relativetoanaveragedvoice[39,41]
Distinctivenesseffect Betterrecognitionfordistinctivefaces[37] Betterrecognitionfordistinctivevoices[89]
Perceptualaftereffectsto
anti-faces/voices(Box2)
Largestformatchedvsnon-matchedanti-faces[88] Largestformatchedvsnon-matchedanti-voices[39]
Attractiveness (Box3)
Averagedfaceismoreattractive[90] Averagedvoiceismoreattractive[91]
Developmentandexperience
Earlypreference Preferenceforuprightfaces24hoursafterbirth[43] Fetusesandyounginfantsdiscriminatevoicesfrom
otherauditorystimuli[45,46]
Neuralcorrelates Face-selectiveERPsappearatthree–sixmonths[50] Voiceareasemergebetweenthreeand
sevenmonths[52,53]
Perceptualnarrowing Broadabilitiesforcrossspeciesfacerecognitionat
four–sixmonthsaretunedbyexperiencein
eight–ten-month-oldinfants[54]
Broadabilitiesforphonemediscrimination
atfour–sixmonthsaretunedbyexperiencein
eight–ten-month-oldinfants[56]
Effectsofexperience
inadulthood
Otherraceeffect[60] Languagefamiliarityeffect[57]andown-racebias[59]
ML/MF AL 4 8. * * * * Faces Bodies 3 2.5 1.5 0.5 2 1 0 Voice task Control task Control site Temporal voice area Voice task Control task Key: 2 1 0 -1 -100 0 100 200 300 400500 Time in ms Amplitude in µ V fMRI ERPs TMS Monkey brain No TMS Face area (OFA) Body area (EBA)
Occipito-temporal face areas Temporal voice areas
N170
FTPV
Amplitude ( µ V) Faces Voices Accuracy (d’) 90 85 80 75 70 65 60 (A) (B) (C) (D)Superior temporal sulcus (STS) Sylvian fissure
+4 +2 0 -2 -4 100 200 300 400 500 600 Time (ms) N170 4 2 0 2 4 100 200 300 400 500 60 Key: % c o rr ect r e sponses *
TRENDS in Cognitive Sciences
Figure1.Faceandvoice-selectiveneuralresponses.(A)Left:face-selectiveareasrevealedwithfunctionalMRI(fMRI)areshownintheoccipitaltemporalcortex.Right:the voice-selectiveareasarefoundinsuperiortemporalsulcusandgyrus.(B)Left:faceselicitgreatereventrelatedpotential(ERP)amplitudesthannon-faces170msafter stimulusonset–N170inoccipito-temporalelectrodes(redline–faces).Right:voiceselicitgreateramplitudesthatnon-voicesounds200msafterstimulusonsetin fronto-temporalelectrodes(redline–voices).Reproduced,withpermission,from[14].(C)Left:transcranialmagneticstimulation(TMS)totheoccipitalfaceareaselectively disruptsfacebutnotbodydiscrimination.Adaptedfrom[18].Right:TMStothetemporalvoiceareaselectivelydisruptsvoice/nonvoicediscrimination.Reproduced,with permission,from[20].(D)Left:face-selectiveareasfoundinthesuperiortemporalsulcusofthemacaquebrain.Reproduced,withpermission,from[23].Right: voice-selectiveareaswerefoundinthesuperiortemporalplaneofthemacaquebrain.Reproduced,withpermission,from[25].
Review
TrendsinCognitiveSciences June2013,Vol.17,No.6separatesrepresentationsreflectsthedegreeofsimilarity
betweenfaces.Thissimilarity-basedframeworkaccounts
forarangeofface-recognitionphenomena,suchastheface
inversioneffect,effectsofdistinctivenessandcaricaturing,
and the other race effect [37]. Furthermore, single unit
recording studies in the macaque show neuronal tuning
profiles that are consistent with such similarity-based
representations [38]. Current evidence suggests that all
facesarecodedrelativetoaprototypical,averageface,which liesattheoriginsofthemultidimensionalfacespace(Box2).
Recentstudieshaveuncoveredverysimilarphenomena
for the coding of voice identity. Voices from different
Box2.Multidimensionalfaceandvoicespaces
The idea that faces and voicesare coded relative to a norm has
received its main support from studies that employed behavioral
adaptationparadigms.Adaptationentailsexposuretoastimulusfora
relatively long duration of a few seconds. This long exposure
generates perceptual aftereffects during the presentation of a
subsequent stimulus, such that the representation of the adapted
stimulusbecomesweakerandits ‘opposite’becomes stronger.For
example, after long exposure to the color green, a white screen
appearsredbecauseofopponentred–greencolorcodingintheretina.
Aftereffects,whichwereoriginallyusedtodetectthepropertiesof
low-levelsensorystimuli,suchascolorandmotion,havebeenlater
found alsoforfacegender,identity,and age[92–94].For example,
longexposuretoafemalefacegeneratesastrongermaleperception
ina50%/50%female–malemorphedface.Faceaftereffectshavealso
beenusefulastestsofthepropertiesofthemulti-dimensionalface
space.Inparticular,accordingtothenorm-basedcodinghypothesis,
allfacesarecodedasafunctionoftheirdistancerelativetoanaverage facethatliesintheorigin.Findingsshowedgreateraftereffectsfortwo
stimulithatarelocatedinoppositesidesoftheaverageface(aface
andan anti-face)thantwofaces thatarenotontheaxisthatgoes
through theoriginwheretheaveragefaceresides(seeFigureIA–C)
[38,40].Thesefindingsprovidestrongsupportfortheideathatfaces
arecodedinanorm-basedmannerrelativetoanaverageface.
Interestingly, recentaftereffect studies with voicesreveal similar
effectsforvoiceinformationsuchasgender[95],identity[96,97],and
emotion[98].Voiceaftereffectsalsoprovideevidencefornorm-based
coding ofvoiceidentity: identityaftereffectsinducedby‘anti-voice’
adaptors are greater in magnitude than those induced by
non-opposite adaptors [39]. As for faces, the average voice, normally
perceivedas identity-free,becomestainted withtheidentity ofthe
oppositetotheanti-voiceadaptor(FigureID,E),eventhoughvoiceand anti-voicearenotperceivedasrelatedinidentity.
0.25 0.50 0.75 1.00 A C B A n voice 50% Ind ivid ua l v oic e 100%
“The usual suspects” Adam (Buon 1) Jim (Buon 2) John (Buon 3) Henry (Buon 4)
Original face An-face
100 90 80 70 60 50 40 30 20 10 0 -20 -10 0 10 20 30 40 50
Percent voice identy
Chance level Average voic e Individual voice An-voice No adaptaon Non matched trials Matched trials Corr ect iden fic a on 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Chance level Average vo g No adaptaon Non matched trials Matched trials Matched an-face No adaptaon Non-matched an-face 1.00 0.75 0.50 0.25 0.00 -0.2 Identy strength -0.4 Corr ect iden fic a on (A) (B) (C) (E) (D)
TRENDS in Cognitive Sciences
FigureI.Perceptualaftereffectsof‘anti-face’and‘anti-voice’adaptation.(A–C)Anti-faceadaptation.(A)Fourfaceidentitiesusedinarecognitiontask(leftcolumn)and theircorresponding‘anti-face’versions(rightcolumn);notetheverydifferentidentitypreceptsassociatedwithafaceanditsanti-face;yet,theyarerelatedinthat averagingthemtogetherresultsintheaverageface.(B)Stimuliusedinrecognitiontasksrepresentedinatheoreticalmultidimensionalspacecenteredontheaverage face(bluecircle).Greencirclesindicatelearnedidentities.Redcirclesindicateanti-faces.(C)Psychophysicallabelingfunctionsobtainedasafunctionofincreased identitystrengthatbaseline(noadaptation:continuousline,opensymbols)andafteradaptation(closedsymbols)withmatched(continuousline)andnon-matched (dashedline)anti-faceadaptors.Notethegreateraftereffectsinducedbymatchedanti-faceadaptorsandthestrongidentityperceptassociatedwiththeotherwise identityneutralaverageface(identitystrength0)afteradaptationwithmatchedanti-faces.Reproduced,withpermission,from[40].(D–E)Anti-voiceadaptation.(D) Threevoicestimuli(briefsyllablesrepresentedbytheirspectrogram)showninatheoreticalmultidimensionalspace,withanaveragedvoiceinitscenter,andwiththeir correspondinganti-voicestimuli(onthegreencircle).(E)Psychophysicallabelingfunctionobtainedasafunctionofincreasedidentitystrengthatbaseline(no adaptation:orangesymbols)andafteradaptationwithmatched(bluesymbols)andnon-matched(pinksymbols)anti-voiceadaptors.Note,asforfaces,thegreater aftereffectsinducedbyadaptationwithmatchedanti-voiceadaptors.Reproduced,withpermission,from[39].
speakers can berepresented as points in a multidimen-sionalspace(Box2).Similartofaces,aprototypicalvoice
stimulus canbe generatedbyaveragingtogetheralarge
numberofdifferentvoicesofthesamegender.Aparticular
roleofthisprototypicalvoicehasbeenshownvia
percep-tual aftereffectsinduced by adaptationwith ‘anti-voices’
[39] in an experimental paradigmdirectly adopted from
face experiments [40]. Cerebral activity in the TVA has
recently been shown to vary as a function of a voice’s
acoustical distance to the prototypical voice [41] – i.e.,
‘‘norm-based coding’’. This is analogous to results from
the fusiform face area which showed increase in signal
with increaseddistancefromthemeanface[38,42].
Developmentandexperience
Given the importance of face and voice recognition for
intactsocialfunctioningandthespecificcomputationsthat
areneededtoextracttherichinformationthattheyconvey,
itmay notbesurprisingthat processingmechanismsfor
faces and voices appear very early in development. A
specific preference for upright faces in infants has been
found during the first 24hours after birth [43]. These
findings suggest that face processing mechanisms may
beinnateandthatearlyonface-likefiguresattract
atten-tionmorethanothernon-facestimuli[44].Similarly,there
isclearevidencethatveryyounginfants–evenfetuses–
can discriminate voices from other auditory stimuli and
can recognizetheir mother’s voice [45,46]. Bythe age of
threemonths,infantsalsopreferlisteningtohumanvoices
than tovocalizationsfromotherspecies[47].
Earlyevidenceforneuralselectiveresponsestofacesor
voicesalsoexists.Forfaces,onepositronemission
tomog-raphy(PET)studywithtwo-month-oldinfantshasshown
face-selective responses (faces > diodes) in the lateral
occipital andthe fusiformgyrus. Although the choiceof
controlstimuliwasnotideal,theseareasmaycorrespond
to the adult OFAandFFA [48]. Eventrelated potential
(ERP) studies with three-month-old infants reveal
face-selectivecomponents–theN290andN400[49,50].These
componentsemergelaterthantheadultN170andspread
overalongertimerange.Thus,face-selectiveneural
mech-anismsmayexistatearlyinfancy,butarefurther
sharp-ened during development. With respect to information
carriedbyvoices,thecontrastoffMRImeasuresofactivity
forspeechvsreversedspeechalreadyshowsanadult-like
left-lateralizedpatternat threemonths[51].Evidenceof
greater response to vocal vs non-vocal sounds seems to
emergeslightlylater,betweenthreeandsevenmonths,as
shown by near-infrared spectroscopy (NIRS) and fMRI
[52,53].Notably,newbornsalreadyexhibitaneural signa-tureforvoiceidentityrecognition[46].
Evidenceforearly,possiblyinnate,existenceoffaceand
voice selectivemechanismsdoesnotimply thattheir
de-velopment is not influenced by experience. Perceptual
narrowingduringinfancyhasbeenreportedforbothface
and speech stimuli. In particular, at six months of age
infantscanrecognizebothmonkeyandhumanfaces,but
the former ability declines by nine months, when face
recognitionbecomesbetterforhumanfaces[54,55].
Simi-larperceptualnarrowinghasbeenreportedforspeech[56].
Thelanguagespokeninone’sculturalgroupisanobvious
such influence of experience, with evidence for cerebral
mechanismstunedtothe specificsetofphonemesofthe
maternal language within the firstyear after birth(see
Box1forperceptualnarrowingofface–voiceintegration).
Non-linguistic aspects of voice perception, such as
speaker recognition,alsoseem to besusceptible to
envi-ronmentalinfluence: it is wellestablished that listeners
recognize speakers of their own or a familiar language
better than speakers of an unfamiliar language, the
languagefamiliarityeffect[57,58]andthereispartial evi-denceforapotentialeffectofraceonvoicerecognition[59].
Thisphenomenonmayparallelthewell-established‘other
raceeffect’–humans’poorabilitytorecognizefacesofother
races (e.g., Asian faces by Caucasian observers and vice
versa)[60],whichresultsfromthelittlecontactwithfaces
ofotherraces.Takentogether,evidencesuggeststhat
mech-anismsselectivefortheprocessingoffacesandvoicesappear
veryearlyindevelopmentandmayevenbeinnate.These
mechanismsarewidelytunedtoalltypesoffaceandvoice/
speechstimuliearlyon,butnarrowdownalreadybynine
monthsofageandremainnarrowlytunedtothetypeoffaces
andvoicesonehasexperiencewithalsoinadulthood.
Unexploredface–voicesimilarities
Whereas ample evidence already exists for the similar
coding of faces and voices, many phenomena that have
beendiscoveredintheextensivestudyoffacesinthepast
50yearsstill awaittesting withvoice stimuli. Crucially,
several behavioral phenomena have suggested a special
statusforfacescomparedtonon-faceobjects,butnosuch
effectsareknownforvocalstimuli.Thesewouldincludea
voicecorrelate ofthefaceinversioneffect[61]and/or the
contrastreversaleffects (stimulusmanipulationsthat
re-sultinadisproportionatelylargerecognitiondeficitrelative tonon-facestimuli[62].Anotherhallmarkoffaceprocessing
isits holisticrepresentation[63],which ismanifestedby
interactive, rather than independent, representation of
the face parts. Testing whether these well-established
face-specificeffectshavetheircounterpartsintheauditory
domainmaybeafruitfulavenueofresearch.Forinstance,
studiesusingagatingparadigmorexaminingtheeffectsof
transformationsuchastimereversalorfrequencyreversal
(or ‘rotation’ [64]) on different stimuli could potentially
highlighteffectsspecifictovocalsounds[65,66].
Other phenomenathat have beenextensively studied
withfacesarethedifferentrepresentationsoffamiliarand
unfamiliarfaces[67,68].Forexample,therepresentation
of familiar faces is more tolerant to stimulus
manipula-tions such as viewpoint or lighting changes relative to
unfamiliar faces. Also, faces are detected more rapidly
than other objects in visual scenes and search arrays
[69] and have been shown to capture attention relative
to other objects [70]. It is still unknownwhether voices
haveasimilarprivilegedstatusrelativetoother sounds.
Finally, faces automatically elicit social inferences
about the personality ofthe individual [71,72].
Interest-ingly, it has been shown that these inferences can be
clusteredintotwomainindependentinferences,
trustwor-thinessanddominance[71,72].Evidenceforasimilar
two-dimensional space that maps onto trustworthiness and
dominancehasalsobeensuggestedforvoices[73].Future
Review
TrendsinCognitiveSciences June2013,Vol.17,No.6studies will determine whether trustworthiness and
dominancearecorrelatedwithvoiceexpressionandvoice
gender,respectively,aswasshownforfaces[74].
Concludingremarks
Visualand auditory signalshave very different physical
properties and are processed by separate neural
sub-strates. Nevertheless, the visual andauditory pathways
do employsome similar mechanisms,including the
reti-notopicandtonotopicrepresentationsseeninearlysensory
corticesandaseparationto‘what’and‘where’pathwaysin
bothvisionandaudition[27,75].In thisreview,wehave
shownthatthetwosystemsalsoapplyverysimilar
compu-tationaloperationstotheprocessingoftheircategoriesof
overridingecologicalimportance,facesandvoices.Thisis
manifestedincategoryneuralselectivitytofacesandvoices
thatwasfoundbothinhumanandmacaquebrains,selective
cognitiveimpairments, and earlyappearance in
develop-ment. Furthermore, similar norm-based coding schemes
foridentity and attractiveness (Box3)and separate, but
interactive pathways for identity expression and speech
have been demonstrated (Table 1). Thesesimilarities, as
wellas othersthat shouldbe exploredin futuresstudies
(Box4),arelikelytocontributetoeffectiveface–voice
inte-gration(Box1),whichhasbeenshowntoresultin
recogni-tionthatexceedsthesumofeachofthestimulialone.
Box3.Areaveragedfacesandvoicesmoreattractive?
Ithasbeenshownforoveracenturythat‘averagedfaces’generated
byaveragingtogetheranumberofdifferentfacesarehighlyattractive
[90,99](FigureIA).Evolutionarytheoryproposesthataveragedfaces
aremoreattractivebecausetheycontainfeaturesthatareindicatorsof
fitness in natural faces (the ‘good genes’ account): symmetry,
averageness,texturesmoothness[100,101].Amorecognitive
expla-nationofthis phenomenon isinterms ofsimilarity tothe internal
prototype,whichresultsineasiertoprocess,morepleasantstimuli
(‘perceptualfluency’)[102].
Both the good genes and perceptual fluency accounts predict
that asimilarphenomenonshould beobservedforvoices. Bruckert
et al. [91]used morphing(Figure IB) togeneratevoice composites
madeofanincreasingnumberofvoicesandobserved,aspredicted
by face studies, a significant increase in attractiveness ratings.
Two main acoustical parameters werehighlighted, both analogous
to those shown to influence face attractiveness: distance-to-mean
(acoustical similarity with the population average); and ‘texture
smoothness’ (i.e., amount of spectro-temporal irregularities)
[91].
Notethatforbothfacesandvoices,averagenessappearstobeone
factor among manythatinfluencetheattractivenesspercept.Other
factors,suchassexualdimorphism,arealsoknowntocontributeto
both faceandvoiceattractivenessinacomplex,context-dependent
manner[103–105].
Male faces Female faces
7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7
Number of face averaged Number of face averaged Number of voices averaged
A ra cv e ness judgmen t A ra cv e ness judgmen t 1.0 0.5 0 -0.5 -1.0
Male voices Female voices
1 2 4 8 16 32 1 2 4 8 16 32 U n a ra c ve A ra c ve A ra c v e n e ss (A) (C) (B) (D)
TRENDS in Cognitive Sciences
FigureI.Faceandvoiceattractivenessjudgmentsasafunctionofaveraging.(A)Facecompositesgeneratedbyaveraging32malefaces(left)and64femalefaces (right).(B)Attractivenessratingsasafunctionofnumberoffaceaveraged.Notethesteadyincreaseinattractivenessratingswithincreasingnumberofaveragedfaces, forbothmale(left)andfemale(right)faces.Reproduced,withpermission,from[106].(C)Spectrogramsofvoicecompositesgeneratedbyaveraginganincreasingnumber ofvoicesofthesamegender(differentspeakersutteringthesyllable‘had’).(D)Attractivenessratingsasafunctionofnumberofvoicesaveraged.Notethesteadyincreasein attractivenessratingswithincreasingnumberofaveragedvoices,forbothmale(left)andfemale(right)voices.Reproduced,withpermission,from[91].
Box4.Outstandingquestions
Is the perceptual and cerebralprocessing of unfamiliar voices
differentinnaturefromthatofhighlyfamiliarvoices,ashasbeen
demonstratedforfaces?
Is there ‘holistic’ processing in representing voice?Can ‘voice
inversion’or‘voicecomposite’effectsbeobserved?
Isthethresholdforvoicedetectionlowerthanforothersound
categories?Dovoicescapturemoreattentionthanotherauditory
stimuli?
Arethereanyneural/perceptualeffectsthatarespecifictovoices
thatshouldbestudiedwithfaces?
Istheneuralsystemthatmediatesfaceprocessingmoreextensive
thantheneuralsystemthatmediatesvoiceprocessing?
Note thatthisreview haslargelyfocusedonthe
simi-laritybetweenfacesandvoices.However,thesetwostimuli
also differ in important ways. Importantly, human face
recognitionabilitiessurpasstheabilitytorecognizepeople
by voices[e.g.,76].Thismaynotbesurprising giventhe
factthat humansarehighlyvisualspecies.Whetherthis
differencereflectsamorecomplexorganizationoftheface
networkwith,forexample,moreareas(asthedataavailable
onvoiceareasinthehumanormacaquebrainsuggest)ora
lessinformativesignaltostartwith(1-dimensionalsound
frequencyvs2-dimensionalvisualspatial),orboth,remains
tobeestablished.
Acknowledgement
WewouldliketothankVadimAxelrod,JuliaSliwa,PatriciaBestelmeyer
andMarianneLatinusforusefulcomments.
References
1Belin,P.etal.(2011)Understandingvoiceperception.Br.J.Psychol. 102,711–725
2Kanwisher,N.andYovel,G.(2006)Thefusiformfacearea:acortical regionspecializedfortheperceptionoffaces.Philos.Trans.R.Soc. Lond.B:Biol.Sci.361,2109–2128
3Atkinson,A.P.andAdolphs,R.(2011)Theneuropsychologyofface perception: beyondsimple dissociationsand functional selectivity. Philos.Trans.R.Soc.Lond.B:Biol.Sci.366,1726–1738
4Tsao,D.Y.etal.(2008)Comparingfacepatchsystemsinmacaques andhumans.Proc.Natl.Acad.Sci.U.S.A.105,19514–19519
5Charest,I.etal.(2013)Cerebralprocessingofvoicegenderstudied usingacontinuouscarryoverfMRIdesign.Cereb.Cortex23,958–966
6Ethofer,T. etal.(2012)Emotionalvoiceareas:anatomic location, functional properties, and structural connections revealed by combinedfMRI/DTI.Cereb.Cortex22,191–200
7Leaver,A.M.andRauschecker,J.P.(2010)Corticalrepresentationof natural complexsounds: effectsof acoustic features and auditory objectcategory.J.Neurosci.30,7604–7612
8Moerel, M. et al. (2012) Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity.J.Neurosci.32,14205–14216
9Remedios,R.etal.(2009)Anauditoryregionintheprimateinsular cortexrespondingpreferentiallytovocalcommunicationsounds.J. Neurosci.29,1034–1045
10 Romanski, L.M. (2012) Integration of faces and vocalizations in ventral prefrontal cortex: implications for the evolution of audiovisual speech. Proc. Natl. Acad. Sci. U.S.A. 109(Suppl. 1), 10717–10724
11 Romanski,L.M.etal.(2005)Neuralrepresentationofvocalizations intheprimateventrolateralprefrontalcortex.J.Neurophysiol.93, 734–747
12 Gao, Z. et al. (2012) A magnetoencephalographic study of face
processing:M170,gamma-bandoscillationsandsourcelocalization.
Hum.BrainMapp. http://dx.doi.org/10.1002/hbm.22028
13 Rossion,B.andJacques,C.(2008)Doesphysicalinterstimulusvariance accountforearlyelectrophysiologicalfacesensitiveresponsesinthe humanbrain?TenlessonsontheN170.Neuroimage39,1959–1979
14 Charest,I.etal.(2009)Electrophysiological evidencefor anearly processingofhumanvoices.BMCNeurosci.10,127
15 De Lucia, M. et al. (2010) A temporal hierarchy for conspecific vocalizationdiscriminationinhumans.J.Neurosci.30,11210–11221
16 Rogier, O. etal. (2010) An electrophysiological correlateof voice processing in 4- to 5-year-oldchildren. Int. J. Psychophysiol. 75, 44–47
17 Capilla,A.etal.(2013)Theearlyspatio-temporalcorrelatesandtask independenceofcerebralvoiceprocessingstudiedwithMEG.Cereb. Cortex23,1388–1395
18 Pitcher,D.etal.(2009)Tripledissociationoffaces,bodies,andobjects inextrastriatecortex.Curr.Biol.19,319–324
19 Sadeh, B. et al. (2011) Stimulation of category-selective brain areasmodulatesERPtotheirpreferredcategories.Curr.Biol.21, 1894–1899
20Bestelmeyer, P. et al. (2011) Right temporal TMS impairs voice detection.Curr.Biol.21,R838–R839
21Gainotti,G.(2013)Lateralityeffectsinnormalsubjects’recognitionof familiar faces, voicesand names.Perceptualandrepresentational components.Neuropsychologia51,1151–1160
22Yovel,G.etal.(2008)Theasymmetryofthefusiformfaceareaisa stable individual characteristic thatunderlies the left-visual-field superiorityforfaces.Neuropsychologia46,3061–3068
23Freiwald,W.A.andTsao,D.Y.(2010)Functionalcompartmentalization and viewpoint generalization within the macaque face-processing system.Science330,845–851
24Yovel, G.and Freiwald, W.A. (2013)Face recognitionsystems in
monkey andhuman:aretheythesamething?F1000PrimeReport
5http://dx.doi.org/10.12703/P5-10
25Petkov, C.I.etal.(2008)Avoiceregioninthemonkeybrain.Nat. Neurosci.11,367–374
26Perrodin,C.etal.(2011)Voicecellsintheprimatetemporallobe. Curr.Biol.21,1408–1415
27Kravitz,D.J.etal.(2011)Anewneuralframeworkforvisuospatial processing.Nat.Rev.Neurosci.12,217–230
28Rauschecker, J.P.andScott,S.K.(2009)Mapsandstreamsinthe auditory cortex: nonhuman primates illuminate human speech processing.Nat.Neurosci.12,718–724
29Barton,J.J.(2008)Structureandfunctioninacquiredprosopagnosia: lessons from a series of 10 patients with brain damage. J. Neuropsychol.2,197–225
30Susilo, T. and Duchaine, B. (2012) Advances in developmental
prosopagnosia research. Curr. Opin. Neurobiol. http://dx.doi.org/
10.1016/j.conb.2012.12.011
31Hailstone,J.C.etal.(2010)Progressiveassociativephonagnosia:a neuropsychologicalanalysis.Neuropsychologia48,1104–1114
32Neuner, F. and Schweinberger, S.R. (2000) Neuropsychological impairments in the recognition of faces, voices, and personal names.BrainCogn.44,342–366
33Sidtis,D.andKreiman,J.(2012)Inthebeginningwasthefamiliar voice:personallyfamiliarvoicesintheevolutionaryandcontemporary biologyofcommunication.Integr.Psychol.Behav.Sci.46,146–159
34Garrido, L. et al. (2009) Developmental phonagnosia: a selective deficitofvocalidentityrecognition.Neuropsychologia47,123–131
35von Kriegstein,K.etal.(2005)Interactionoffaceandvoiceareas duringspeakerrecognition.J.Cogn.Neurosci.17,367–376
36Young,A.W.andBruce,V.(2011)Understandingpersonperception. Br.J.Psychol.102,959–974
37Valentine,T.(1991)Aunifiedaccountoftheeffectsofdistinctiveness, inversion, and raceinface recognition. Q.J. Exp.Psychol. A43, 161–204
38Leopold, D.A. et al. (2006) Norm-based face encoding by single neuronsinthemonkeyinferotemporalcortex.Nature442,572–575
39Latinus, M. and Belin, P. (2011) Anti-voice adaptation suggests prototype-basedcodingofvoiceidentity.Front.Psychol.2,175
40Leopold, D.A. et al. (2001) Prototype-referenced shape encoding revealedbyhigh-levelaftereffects.Nat.Neurosci.4,89–94
41Latinus, M. etal. Norm-basedcoding of voiceidentity in human
auditorycortex.CurrentBiology,inpress
42Loffler,G.(2005)fMRievidencefortheneuralrepresentationoffaces. Nat.Neurosci.8,1386–1390
43Simion, F. et al. (2011)The processing of social stimuliin early infancy: from faces to biological motion perception. Prog. Brain Res.189,173–193
44Johnson,M.H.(2005)Subcorticalfaceprocessing.Nat.Rev.Neurosci. 6,766–774
45Kisilevsky, B.S. et al. (2003) Effects of experience on fetal voice recognition.Psychol.Sci.14,220–224
46Beauchemin, M. et al. (2011) Mother and stranger: an electrophysiological studyof voice processingin newborns.Cereb. Cortex21,1705–1711
47Vouloumanos, A. et al. (2010) The tuning of human neonates’ preferenceforspeech.ChildDev.81,517–527
48Tzourio-Mazoyer,N.etal.(2002)Neuralcorrelatesofwomanface processingby2-month-oldinfants.Neuroimage15,454–461
49de Haan, M. et al. (2003) Development of face-sensitive event-relatedpotentials during infancy:a review. Int.J. Psychophysiol. 51,45–58
Review
TrendsinCognitiveSciences June2013,Vol.17,No.650 Halit,H.etal.(2003)Corticalspecialisationforfaceprocessing: face-sensitiveevent-relatedpotentialcomponentsin3-and12-month-old infants.Neuroimage19,1180–1193
51 Dehaene-Lambertz, G. et al. (2002) Functional neuroimaging of speechperceptionininfants.Science298,2013–2015
52 Grossman, T. et al. (2010) The developmental origins of voice processinginthehumanbrain.Neuron65,852–858
53 Blasi, A. etal. (2011) Early specialization for voice and emotion processingintheinfantbrain.Curr.Biol.21,1220–1224
54 Pascalis,O.etal.(2002)Isfaceprocessingspecies-specificduringthe firstyearoflife?Science296,1321–1323
55 Di Giorgio,E. etal.(2012) Istheface-perception system human-specificatbirth?Dev.Psychol.48,1083–1090
56 Kuhl, P.K. et al. (1992) Linguistic experience alters phonetic perceptionininfantsby6monthsofage.Science255,606–608
57 Perrachione, T.K. and Wong, P.C. (2007) Learning to recognize speakersofa non-nativelanguage:implicationsfor thefunctional organization of human auditory cortex. Neuropsychologia 45, 1899–1910
58 Winters, S.J. et al. (2008) Identification and discrimination of bilingualtalkersacrosslanguages.J.Acoust.Soc.Am.123,4524–4538
59 Perrachione, T.K. et al. (2010) Asymmetric cultural effects on perceptualexpertiseunderlieanown-racebiasforvoices.Cognition 114,42–55
60 Young, S.G. et al. (2012) Perception and motivation in face recognition: a criticalreview of theories oftheCross-Race Effect. Pers.Soc.Psychol.Rev.16,116–142
61 Yin,R.K.(1969)Lookingatupside-downfaces.J.Exp.Psychol.81, 141–145
62 Nederhouser, M. et al. (2007) The deleterious effect of contrast reversalon recognitionisuniquetofaces, notobjects.VisionRes. 47,2134–2142
63 DeGutis,J.etal.(2013)Usingregressiontomeasureholisticface processing reveals a strong link with face recognition ability. Cognition126,87–100
64 Blesser, B. (1972)Speech perception under conditionsof spectral transformation.I.Phoneticcharacteristics.J.SpeechHear.Res.15, 5–41
65 Bedard,C.andBelin,P.(2004)A‘voiceinversioneffect’?BrainCogn. 55,247–249
66 Agus,T.R.etal.(2012)Fastrecognitionofmusicalsoundsbasedon timbre.J.Acoust.Soc.Am.131,4124–4133
67 Bruce, V. et al. (1993) Effects of distinctiveness, repetition and semantic priming on the recognition of face familiarity. Can. J. Exp.Psychol.47,38–60
68 Natu,V.andO’Toole,A.J.(2011)Theneuralprocessingoffamiliar and unfamiliar faces:a review and synopsis.Br.J.Psychol. 102, 726–747
69 Hershler,O.etal.(2010)Thewidewindowoffacedetection.J.Vis.10, 21
70 Langton,S.R.etal.(2008)Attentioncapturebyfaces.Cognition107, 330–342
71 Oosterhof,N.N.andTodorov,A.(2008)Thefunctionalbasisofface evaluation.Proc.Natl.Acad.Sci.U.S.A.105,11087–11092
72 Willis,J.andTodorov,A.(2006)Firstimpressions:makingupyour mindaftera100msexposuretoaface.Psychol.Sci.17,592–598
73 Scherer,K.R.(1972)Judgingpersonalityfromvoice:across-cultural approach to anoldissuein interpersonal perception.J.Pers. 40, 191–210
74 Engell, A.D. et al. (2010) Common neural mechanisms for the evaluation of facialtrustworthiness andemotional expressionsas revealedbybehavioraladaptation.Perception39,931–941
75 Rauschecker,J.P.andTian,B.(2000)Mechanismsandstreamsfor processingof‘what’and‘where’inauditorycortex.Proc.Natl.Acad. Sci.U.S.A.97,11800–11806
76 Stevenage,S.V.etal.(2013)Theeffectofdistractiononfaceandvoice recognition.Psychol.Res.77,167–175
77 Belin, P. et al., eds (2012) Integrating face and voice in person perception,Springer
78 Schweinberger, S.R. et al. (2011) Hearing facial identities: brain correlatesofface–voiceintegrationinperson identification.Cortex 47,1026–1037
79 Smith,E.L.etal.(2007)Auditory-visualcrossmodalintegration in perceptionoffacegender.Curr.Biol.17,1680–1685
80 deGelder,B.andVroomen,J.(2000)Theperceptionofemotionsby earandbyeye.Cogn.Emot.14,289–311
81 Pourtois,G.etal.(2005)Perceptionoffacialexpressionsandvoices andoftheircombinationinthehumanbrain.Cortex41,49–59
82 Hagan, C.C. et al. (2009) MEG demonstrates a supra-additive responsetofacialandvocalemotionintherightsuperiortemporal sulcus.Proc.Natl.Acad.Sci.U.S.A.106,20010–20015
83 Patterson, M.L. and Werker, J.F. (2003) Two-month-old infants matchphoneticinformationinlipsandvoice.Dev.Sci.6,191–196
84 Lewkowicz, D.J. and Ghazanfar, A.A. (2009) The emergence of multisensory systems through perceptual narrowing. Trends Cogn.Sci.13,470–478
85 Grossmann,T.etal.(2012)Neuralcorrelatesofperceptualnarrowing incross-speciesface-voicematching.Dev.Sci.15,830–839
86 Haxby,J.V.etal.(2000)Thedistributedhumanneuralsystemforface perception.TrendsCogn.Sci.4,223–233
87 Belin,P.etal.(2000)Voice-selectiveareasinhumanauditorycortex. Nature403,309–312
88 Jeffery,L.etal.(2011)Distinguishingnorm-basedfrom exemplar-based coding of identity in children: evidence from face identity aftereffects.J.Exp.Psychol.Hum.Percept.Perform.37,1824–1840
89 Lavner,Y.etal.(2001)Theprototypemodelinspeakeridentification byhumanlisteners.Int.J.SpeechTechnol.4,63–74
90 Langlois,J.H.andRoggman, L.A.(1990)Attractivefacesareonly average.Psychol.Sci.1,115–121
91 Bruckert,L.etal.(2010)Vocalattractivenessincreasesbyaveraging. Curr.Biol.20,116–120
92 O’Neil,S.F.andWebster,M.A.(2011)Adaptationandtheperception offacialage.Vis.Cogn.19,534–550
93 Tillman,M.A.andWebster,M.A.(2012)Selectivityoffacedistortion aftereffectsfordifferencesinexpressionorgender.Front.Psychol.3, 14
94 Webster,M.A.andMacLeod,D.I.(2011)Visualadaptationandface perception.Philos.Trans.R.Soc.Lond.B:Biol.Sci.366,1702–1725
95 Schweinberger, S.R. et al. (2008) Auditory adaptation in voice perception.Curr.Biol.18,684–688
96 Za¨ske, R.etal. (2010)Voiceaftereffects of adaptationtospeaker identity.Hear.Res.268,38–45
97 Latinus,M.andBelin,P.(2012)Perceptualauditoryaftereffectson voiceidentityusingbriefvowelstimuli.PLoSONE7,e41384
98 Bestelmeyer, P. et al. (2010) Auditory adaptation in vocal affect perception.Cognition117,217–223
99 Galton,F.(1878)Compositeportraits.J.Anthropol.Inst.8,132–144
100Grammer,K.etal.(2003)Darwinianaesthetics:sexualselectionand thebiologyofbeauty.Biol.Rev.Camb.Philos.Soc.78,385–407
101Little, A.C.etal. (2011)Facialattractiveness:evolutionarybased research.Philos.Trans.R.Soc.Lond.B:Biol.Sci.366,1638–1659
102Winkielman,P.etal.(2006)Prototypesareattractivebecausethey areeasyonthemind.Psychol.Sci.17,799–806
103Feinberg,D.R.etal.(2008)Theroleoffemininityandaveragenessof voicepitchinaestheticjudgmentsofwomen’svoices.Perception37, 615–623
104Said, C.P. and Todorov, A. (2011) A statistical model of facial attractiveness.Psychol.Sci.22,1183–1190
105DeBruine, L.M. et al. (2007) Dissociating averageness and attractiveness: attractive faces are not always average. J. Exp. Psychol.Hum.Percept.Perform.33,1420–1430
106Braun, C. et al. (2001) Beautycheck -Ursachen und Folgenvon
Attraktivita¨t. Projektabschlussbericht. http://www.beautycheck.de/
cmsms/index.php/der-ganze-bericht