A unified coding strategy for processing faces and voices

(1)

HAL Id: hal-02008916

https://hal-amu.archives-ouvertes.fr/hal-02008916

Submitted on 6 Feb 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Distributed under a Creative Commons Attribution| 4.0 International License

A unified coding strategy for processing faces and voices

Galit Yovel, Pascal Belin

To cite this version:

Galit Yovel, Pascal Belin. A unified coding strategy for processing faces and voices. Trends in

Cognitive Sciences, Elsevier, 2013, 17 (6), pp.263-271. �10.1016/j.tics.2013.04.004�. �hal-02008916�

(2)

A

unified

coding

strategy

for

processing

faces

and

voices

Galit

Yovel

1

and

Pascal

Belin

2,3,4

1_School_of_{Psychological}_Sciences_and_Sagol_School_of_{Neuroscience,}_Tel_Aviv_University,_Tel_Aviv,_Israel 2_Institute_of_Neuroscience_and_Psychology,_University_of_Glasgow,_Glasgow,_UK

3

De´partementdePsychologie,Universite´deMontre´al,Montre´al,Canada

4_Institut_des_{Neurosciences}_de_La_Timone,_UMR_7289,_CNRS_and_Universite´_{Aix-Marseille,}_France

Bothfacesandvoicesarerichinsocially-relevant

infor-mation,whichhumansareremarkablyadeptat

extract-ing,includingaperson’sidentity,age,gender,affective

state,personality,etc.Here,wereviewaccumulating

evi-dencefrombehavioral,neuropsychological,

electrophys-iological,andneuroimagingstudieswhichsuggestthat

the cognitive and neural processing mechanisms

en-gagedby perceiving faces or voices are highlysimilar,

despitetheverydifferentnatureof theirsensoryinput.

Thesimilaritybetweenthetwomechanismslikely

facil-itates the multi-modal integration of facial and vocal

information during everyday social interactions. These

findingsemphasizeaparsimoniousprincipleofcerebral

organization,where similar computationalproblems in

differentmodalitiesaresolvedusingsimilarsolutions.

Similarcognitiveandneuralrepresentationsforfaces

andvoices

Facesandvoicesarethemostsociallyimportantstimuliin

thevisualandauditorydomains,respectively.Thenature

ofthesensoryinputassociated withthesekey social

sti-muliisverydifferent:reflectionsoflightonthefacevsair

pressure waves generated by the vocal apparatus. Yet,

theybothconveyvery similartypesofinformationabout

aperson,includingidentity,gender,emotionalstate,and

age.Furthermore,inmanycasesofsocialcommunication

faces andvoices are processed simultaneously andhave

beenshown tohave facilitatory effectson recognition of

person information relative to when each is presented

alone(forareview,see[1];Box1).Itisthereforeplausible

that,despitetheirverydifferentsensoryinput,theymay

generate,atleasttosomeextent,asimilarrepresentation.

Indeed,recent studiesreveal manysimilarities between

theirneuralandcognitiverepresentations.

In this review, we highlight the many similarities

that havebeen foundbetweenthe neural and cognitive

mechanismsoffaceandvoiceprocessinginthepastfew

years. We will summarize evidence pertaining to the

followingfiveareas:neurophysiologicalmechanisms;

neu-rocognitivedisorders;functionalarchitecture;perceptual

coding; and development and experience (Table 1; see

Glossary).Becausefaceshavebeenstudiedmoreextensively

thanvoices,wewillalsohighlightseveralwell-established

phenomenathathavebeenreportedforfacesandshould

be investigated in future studies with voices tofurther

exploretheirunifiedcodingstrategy.

Themanysimilaritiesthatexistbetweentheneuraland

cognitive representation of faces and voices suggest a

unifyingcodingmechanismthathasevolvedtorepresent

the very rich and diverse informationthat these unique

classes of visual and auditory stimuli convey about a

person. More generally, these findings suggest that the

brainmayemploysimilarprinciplesforprocessingstimuli

that convey similar typesofinformation notonlywithin

thesamemodality,butalsoacrossdifferentmodalities.

Glossary

Adaptation: prolonged exposure to a stimulus modulates neural and behavioralresponsestothestimulus.

Aftereffects:prolongedexposuretoastimulusdistortsperceptionofanew stimulusthatfollowsitintime,inadirectionoppositetothefirststimulus. Faceinversioneffect:declineinrecognitionforupside-downrelativetoupright stimuliislargerforfacesthananyothernon-faceobjects.

Fusiformfacearea(FFA):aface-selectiveareathatisfoundinthehuman fusiformgyrus.Otheroccipito-temporalface-selectiveareasarefoundinthe lateraloccipitalandsuperiortemporalsulcus.

Otherraceeffect:recognitionoffacesofotherraces(e.g.,Caucasianfacesby Asianobservers)ispoorrelativetoownracefaces(e.g.,AsianfacesbyAsian observers).

Retinotopy:thevisualfieldismappedtopographicallyontoneuronsinthe retina and earlyvisual cortexsuchthat adjacentneurons encode nearby regionsinthevisualfield.

Temporalvoiceareas(TVA):voice-selectiveareasfoundinthemiddleand anteriorpartsofthehumansuperiortemporalsulcus/gyrusbilaterally,witha right-hemisphericasymmetry.

Tonotopy:soundsofdifferentfrequencyareorganizedtopographicallyinthe ascendingauditory pathways and earlyauditorycortex, suchthat nearby neuronsaretunedtonearbyfrequencies.

Perceptualnarrowing:discriminationabilitiesamongsensorystimuliinyoung infants(three–sixmonthsold)arereducedinolderinfants(ninemonthsold) forstimuliforwhichtheyhavenoperceptualexperiencewith(e.g.,monkey faces,monkeyface–voiceintegration).

Phonoagnosia:selectivedeficitinvoicerecognition.Mostcasesreportedhave beenduetobraindamageintherighttemporo-parietaljunction.Asinglecase ofdevelopmentalphonagnosia(i.e.,withnoapparentbraindamage)hasbeen described.

Prosopagnosia:selectivedeficit infacerecognition,whichappearsintwo forms,acquiredand developmental. Acquiredprosopagnosiaresultsfrom braindamage,usuallyintherighttemporallobe.Developmentalcasessuffer fromalife-longdeficitwithnoapparentbraindamage.

Voiceinversion:voicestimulicanbereversedintimeorinverted(‘orrotated’) infrequency,inducingrecognitiondeficits.

1364-6613/$–seefrontmatter

ß2013ElsevierLtd.Allrightsreserved.http://dx.doi.org/10.1016/j.tics.2013.04.004 Correspondingauthors:Yovel,G. (gality@post.tau.ac.il);

Belin,P. (p.belin@psy.gla.ac.uk)

Keywords:facerecognition; voice recognition;neural selectivity; sensorycoding; visualcortex;auditorycortex.

(3)

Neurophysiologicalmechanisms

Faces and voices have both been shown to elicit highly

selectiveneuralresponsesinthehumanbrain(Figure1A–C).

Faceshavebeen typicallycompared tonon-face objects,

suchashousesorchairs.Voicesareusuallycomparedto

differentcategoriesofnon-vocalsounds,suchas

environ-mental or mechanical sounds. Functional MRI (fMRI)

studies reveal much stronger responses to faces than

anyothernon-facestimuliinatleastthreeoccipital

tem-poral areas: the occipital face area (OFA) in the lateral

occipital cortex,the fusiformface area(FFA)inthe mid

fusiformgyrus,andafacearea inthe posteriorsuperior

temporalsulcus(STS–FA)[2,3](Figure1A,left).Recent

studiesalsorevealmoreanteriorface-selectiveresponses

in the anterior temporal lobe andthe prefrontal cortex

[4]. Voice-selective cortical mechanisms do also exist:

fMRI studies have identified several regions along the

middle and anterior STS and superior temporal gyrus

(STG) that show a greater response to vocal sounds

(regardless of whether they carry intelligible speech or

not[5])thantonon-vocalsounds[6–8]:theseareaswere

named the ‘temporal voice areas’ (TVA) (Figure 1A,

right).Voice-sensitiveresponseshavealsobeenobserved

inotherareas,includingtheinsulaandprefrontalcortex

[9–11].

Consistentwithneuroimagingfindings,

electroenceph-alography (EEG) and magneto-encephalography (MEG)

studies show faceandvoice-selective evoked responses.

Faces elicit a component of much larger in amplitude

than non-facestimuli 170ms after stimulus onset– the

face-selectiveN170/M170[12,13](Figure1B,left).A

voice-selectiveelectrophysiologicalcomponentatalatency

com-parable tothat ofthe N170,termedthe‘fronto-temporal

positivitytovoice’(FTPV),hasbeenalsorecentlyreported

inEEG [14–16](Figure1B,right)andMEG [17]studies

approximately200msaftersoundonset.Finally,

transcra-nial magnetic stimulation (TMS) of fMRI-defined

face-selective areas indicates a causal and specific role for

theoccipital facearea infacediscrimination(Figure 1C,

left) and in the generation of the face-selective N170

response[18,19].Similarly,TMSoverthe TVAhasbeen

showntodisruptvoicedetection[20](Figure1C,right).

Finally,oneprominentandwell-establishedfeatureof

the face-processing mechanism is its right hemisphere

asymmetry, which has been manifested both in neural

andbehavioralmeasures[21,22].Whereasspeech

proces-singislateralizedtothelefthemisphere,voicerecognition,

similar to faces, elicits neural responses that are right

lateralized[21].

Face- and voice-selective neural responses are not

limitedtothehumanbrain,buthavealsobeenobserved

inthemacaquebrain.Faceneuronsarecommonlyfoundin

thesuperiortemporalsulcusandtheinferotemporalcortex

[23]. Furthermore, functional MRI studies reveal a

net-work of face-selective areas primarily in the upper and

lowerbanksofthesuperiortemporalsulcus[4]thatshare

atleastsomeanatomicalandfunctionalsimilaritieswith

the human face areas [24] (Figure 1D, left). Similarly,

monkey fMRI studies revealed voice-selective areas [25]

inthesuperiortemporalplanethatpreferspecies-specific

vocalizations over other vocalizations and sounds

(Figure1D,right).Thesevoice-selectiveareas havebeen

shown tocontain voice-selectiveneurons [26]. The

pres-enceofface-andvoice-dedicated mechanismsinthe

ma-caquebrainindicatesthatthesefaceandvoiceareas did

notjustemergerecentlyinhumansalongwiththe

emer-genceoflanguageandhigh-levelsocialfunctioningskills:

they were probably already present in the last common

ancestorofmacaquesandhumanssome30millionyears

ago. This highlights the importance of these stimuli for

basicsocialfunctioningthroughoutprimateevolution.

In summary, neurophysiological and neuroimaging

findingsconvincinglyshowthatbothfacesandvoiceselicit

ahighlyselectiveneuralresponse.Thishighlightsnotonly

theirsocialimportance,butalsothefactthattheunique

natureoftheirrepresentationrequiresmechanismsthat

aredifferentfromthoseusedfortheprocessingofanyother

visual andauditory stimuli. Moreover, this similarityin

theirneuralrepresentationsisconsistentwithother

simi-larprinciplesusedfortheprocessingofauditoryandvisual

stimuli,suchasthetonotopicandretinotopic

representa-tionsinprimaryauditoryandvisualcortex,respectively,or

theseparate mechanismsfor‘where’and‘what’

informa-tion that have been reported both in visual [27] and

auditory[28]systems.

Neurocognitivedisorders

Consistentwith the strong neuralselectivity thatis

dis-cussedaboveforfacesandvoices,neuropsychological

stud-ies have reported selective impairments in face or voice

recognition,inthefaceofotherwiseintactvisualor

audi-tory functions,respectively.Selectivedeficits inface

rec-ognitionabilities (i.e.,prosopagnosia)were reported over

Box1.Face–voiceintegrationforpersoninformation

Face-voiceintegrationhasbeenprimarilystudiedinthecontextof

speech processing.However, giventhatfacesand voicesconvey

important and similar non-speech information about person

identity,itisalsoimportanttoexamineface–voiceintegrationfor

theprocessingofidentity,emotion,age,andgender.Recentstudies

haveshownthatface–voiceintegrationcontributessignificantlyto

theextractionofpersoninformation[77].Specifically,cross-modal

interactionintheprocessing offaceandvoiceidentityhasbeen

showninstudiesthatpresentedcongruentandincongruentidentity

[78].Face–voiceintegrationforgenderhasbeenshownevenwith

puretonesextractedfrommaleandfemalevoices,whichwerenot

recognizedby participantsas maleor femalevoices.Thesepure

tones biasedperception ofan androgynous face toa maleor a

femalefaceaccording tothegenderofthetone[79].Integration

effects between faces and voices have also been observed for

emotionalinformation[80–82].

Face–voiceintegrationappearsveryearlyinlife.Severalstudies

haveshownthatattwomonthsofage,infantsbegintoexhibitthe

ability toperceiveface–voice correspondences [83].Interestingly,

perceptualnarrowing,whichhasbeenshownforfacesandspeech

(seemaintext),hasbeenreportedalsoforface–voiceintegration.

For example, four–six- and eight–ten-month-old infants were

presented with consistent and inconsistent face–voice stimuli of

monkeys and humans. Whereas the four–six–month-old infants

wereabletomatchface-voicestimuliofbothhumansandmonkeys,

eight–ten-month-old infants were able to match human but not

monkeyface–voicestimuli[84,85].Thesimilardevelopmentaltrack

thatisfoundforfacesandvoicespresentedinisolation,aswellas

fortheintegrationofthetwostimuli,isinlinewiththeideathat

similarcodingmechanismsofunisensoryinformationmayunderlie

successfulmultisensoryintegration.

Review

TrendsinCognitiveSciences June2013,Vol.17,No.6

(4)

50yearsagoinbrain-damagedpatientsfollowingalesion

in the occipital temporal cortex, usually over the right

hemisphere[29].Morerecently,similardeficitswerefound

inindividualsthatshownospecificbrainlesion,butsuffer

from life-long prosopagnosia, known as developmental/

congenital prosopagnosia[30].Prosopagnosicindividuals

seem to show intact recognition of objects, but exhibit

severedifficulties in recognizing familiar facesincluding

their close relatives and friends. Regarding voices, the

existenceofpatientswithselectiveimpairmentsinspeech

comprehension has long been established (e.g.,

Wer-nicke’saphasia).Moresimilartoprosopagnosia,asmall

number of ‘phonagnosic’ patients have been identified

withimpairments in speakerdiscrimination or

recogni-tion, even though other aspects of auditory perception

were normal [31–33]. Only one case of ‘developmental

phonagnosia’–theselectiveinabilitytorecognize

speak-ersbytheir voiceintheabsenceofanyevidentcerebral

impairment–hasbeenreportedsofar[34].Itispossible

that the lack of additional developmental phonagnosia

casesmay not reflect anabsence ofsuch cases, but the

inability of individuals who suffer from this deficit to

acknowledgetheir deficit,aswasthecasewith

develop-mental prosopagnosia for many years. Furthermore, a

lackofstandardizedtestsforphonagnosiaalsoimpedes

itsreliablediagnosis.

Functionalarchitecture

Asmentionedabovebothfacesandvoicesconveysimilar

information abouta person, includinggender, emotional

state, identity, and age. The idea that the functional

architecture underlying face and voice processing could

be organized following comparable principles has been

discussedbeforeandthereforewillonlybrieflymentioned

here [1,35]. A neurocognitive architecture described by

Bruce andYoung [36] has been suggested to also apply

to voices [1]: briefly, after a stage of cortical processing

commontoallstimulioftheirparticularsensorymodality,

faces and voices are selectively processed in a further

‘structuralencoding’stage,probablyrepresentedbyareas

such as the FFA and TVA, respectively. Then, in each

modality, thethree maintypesofinformationcarriedby

bothfaces andvoices–identity,affect,speech–are

pro-cessed along functional pathways which, although they

interact with one another during normal functioning,

canbeselectivelyactivated/impaired.

Perceptualcoding

Oneofthemostinfluentialmodelsoffaceprocessingisthe

‘facespacemodel’[37],whichpositsthatfaceidentitycan

berepresentedas locationsinamultidimensionalspace.

The dimensions of this space correspond to information

used to discriminate faces, whereas the distance that

Table1.Facevoicesimilarities

Face Voice

Neuralselectivity

Human

Electrophysiology N170/M170[13] FTPV[14,17]

FunctionalMRI Faceareasinthelateraloccipital,midfusiformand

STS[2,86]

VoiceareasintheSTS[87]

Hemisphericasymmetry Righthemisphere[21,22] Righthemispherevoice-selectivity[21]

(lefthemisphereforspeech)

EffectsofTMS TMSovertheOFAselectivelyimpairsperformance

forfaces[18]andselectivelyincreasestheface

N170[19]

TMSovertheTVAdisruptsvoicedetection[20]

Monkey

Electrophysiology Face-selectivecells[23] Voice-selectivecells[26]

FunctionalMRI Face-selectivebrainareas[4,23] Voice-selectivebrainareas[25]

Selectiverecognitiondeficits

Developmentalandacquiredprosopagnosia[29,30] Developmentalandacquiredphonagnosia[31,34]

PerceptualCoding

Norm-basedcoding(Box2) Relativetoanaveragedface[40,42,88] Relativetoanaveragedvoice[39,41]

Distinctivenesseffect Betterrecognitionfordistinctivefaces[37] Betterrecognitionfordistinctivevoices[89]

Perceptualaftereffectsto

anti-faces/voices(Box2)

Largestformatchedvsnon-matchedanti-faces[88] Largestformatchedvsnon-matchedanti-voices[39]

Attractiveness (Box3)

Averagedfaceismoreattractive[90] Averagedvoiceismoreattractive[91]

Developmentandexperience

Earlypreference Preferenceforuprightfaces24hoursafterbirth[43] Fetusesandyounginfantsdiscriminatevoicesfrom

otherauditorystimuli[45,46]

Neuralcorrelates Face-selectiveERPsappearatthree–sixmonths[50] Voiceareasemergebetweenthreeand

sevenmonths[52,53]

Perceptualnarrowing Broadabilitiesforcrossspeciesfacerecognitionat

four–sixmonthsaretunedbyexperiencein

eight–ten-month-oldinfants[54]

Broadabilitiesforphonemediscrimination

atfour–sixmonthsaretunedbyexperiencein

eight–ten-month-oldinfants[56]

Effectsofexperience

inadulthood

Otherraceeffect[60] Languagefamiliarityeffect[57]andown-racebias[59]

(5)

ML/MF AL 4 8. * * * * Faces Bodies 3 2.5 1.5 0.5 2 1 0 Voice task Control task Control site Temporal voice area Voice task Control task Key: 2 1 0 -1 -100 0 100 200 300 400500 Time in ms Amplitude in µ V fMRI ERPs TMS Monkey brain No TMS Face area (OFA) Body area (EBA)

Occipito-temporal face areas Temporal voice areas

N170

FTPV

Amplitude ( µ V) Faces Voices Accuracy (d’) 90 85 80 75 70 65 60 (A) (B) (C) (D)

Superior temporal sulcus (STS) Sylvian ﬁssure

+4 +2 0 -2 -4 100 200 300 400 500 600 Time (ms) N170 4 2 0 2 4 100 200 300 400 500 60 Key: % c o rr ect r e sponses *

TRENDS in Cognitive Sciences

Figure1.Faceandvoice-selectiveneuralresponses.(A)Left:face-selectiveareasrevealedwithfunctionalMRI(fMRI)areshownintheoccipitaltemporalcortex.Right:the voice-selectiveareasarefoundinsuperiortemporalsulcusandgyrus.(B)Left:faceselicitgreatereventrelatedpotential(ERP)amplitudesthannon-faces170msafter stimulusonset–N170inoccipito-temporalelectrodes(redline–faces).Right:voiceselicitgreateramplitudesthatnon-voicesounds200msafterstimulusonsetin fronto-temporalelectrodes(redline–voices).Reproduced,withpermission,from[14].(C)Left:transcranialmagneticstimulation(TMS)totheoccipitalfaceareaselectively disruptsfacebutnotbodydiscrimination.Adaptedfrom[18].Right:TMStothetemporalvoiceareaselectivelydisruptsvoice/nonvoicediscrimination.Reproduced,with permission,from[20].(D)Left:face-selectiveareasfoundinthesuperiortemporalsulcusofthemacaquebrain.Reproduced,withpermission,from[23].Right: voice-selectiveareaswerefoundinthesuperiortemporalplaneofthemacaquebrain.Reproduced,withpermission,from[25].

Review

(6)

separatesrepresentationsreflectsthedegreeofsimilarity

betweenfaces.Thissimilarity-basedframeworkaccounts

forarangeofface-recognitionphenomena,suchastheface

inversioneffect,effectsofdistinctivenessandcaricaturing,

and the other race effect [37]. Furthermore, single unit

recording studies in the macaque show neuronal tuning

profiles that are consistent with such similarity-based

representations [38]. Current evidence suggests that all

facesarecodedrelativetoaprototypical,averageface,which liesattheoriginsofthemultidimensionalfacespace(Box2).

Recentstudieshaveuncoveredverysimilarphenomena

for the coding of voice identity. Voices from different

Box2.Multidimensionalfaceandvoicespaces

The idea that faces and voicesare coded relative to a norm has

received its main support from studies that employed behavioral

adaptationparadigms.Adaptationentailsexposuretoastimulusfora

relatively long duration of a few seconds. This long exposure

generates perceptual aftereffects during the presentation of a

subsequent stimulus, such that the representation of the adapted

stimulusbecomesweakerandits ‘opposite’becomes stronger.For

example, after long exposure to the color green, a white screen

appearsredbecauseofopponentred–greencolorcodingintheretina.

Aftereffects,whichwereoriginallyusedtodetectthepropertiesof

low-levelsensorystimuli,suchascolorandmotion,havebeenlater

found alsoforfacegender,identity,and age[92–94].For example,

longexposuretoafemalefacegeneratesastrongermaleperception

ina50%/50%female–malemorphedface.Faceaftereffectshavealso

beenusefulastestsofthepropertiesofthemulti-dimensionalface

space.Inparticular,accordingtothenorm-basedcodinghypothesis,

allfacesarecodedasafunctionoftheirdistancerelativetoanaverage facethatliesintheorigin.Findingsshowedgreateraftereffectsfortwo

stimulithatarelocatedinoppositesidesoftheaverageface(aface

andan anti-face)thantwofaces thatarenotontheaxisthatgoes

through theoriginwheretheaveragefaceresides(seeFigureIA–C)

[38,40].Thesefindingsprovidestrongsupportfortheideathatfaces

arecodedinanorm-basedmannerrelativetoanaverageface.

Interestingly, recentaftereffect studies with voicesreveal similar

effectsforvoiceinformationsuchasgender[95],identity[96,97],and

emotion[98].Voiceaftereffectsalsoprovideevidencefornorm-based

coding ofvoiceidentity: identityaftereffectsinducedby‘anti-voice’

adaptors are greater in magnitude than those induced by

non-opposite adaptors [39]. As for faces, the average voice, normally

perceivedas identity-free,becomestainted withtheidentity ofthe

oppositetotheanti-voiceadaptor(FigureID,E),eventhoughvoiceand anti-voicearenotperceivedasrelatedinidentity.

0.25 0.50 0.75 1.00 A C B A n voice 50% In_d iv_id ua l v oic e 10_0%

“The usual suspects” Adam (Buon 1) Jim (Buon 2) John (Buon 3) Henry (Buon 4)

Original face An-face

100 90 80 70 60 50 40 30 20 10 0 -20 -10 0 10 20 30 40 50

Percent voice identy

Chance level Average voic e Individual voice An-voice No adaptaon Non matched trials Matched trials Corr ect iden ﬁc a on 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Chance level Average vo g No adaptaon Non matched trials Matched trials Matched an-face No adaptaon Non-matched an-face 1.00 0.75 0.50 0.25 0.00 -0.2 Identy strength -0.4 Corr ect iden ﬁc a on (A) (B) (C) (E) (D)

FigureI.Perceptualaftereffectsof‘anti-face’and‘anti-voice’adaptation.(A–C)Anti-faceadaptation.(A)Fourfaceidentitiesusedinarecognitiontask(leftcolumn)and theircorresponding‘anti-face’versions(rightcolumn);notetheverydifferentidentitypreceptsassociatedwithafaceanditsanti-face;yet,theyarerelatedinthat averagingthemtogetherresultsintheaverageface.(B)Stimuliusedinrecognitiontasksrepresentedinatheoreticalmultidimensionalspacecenteredontheaverage face(bluecircle).Greencirclesindicatelearnedidentities.Redcirclesindicateanti-faces.(C)Psychophysicallabelingfunctionsobtainedasafunctionofincreased identitystrengthatbaseline(noadaptation:continuousline,opensymbols)andafteradaptation(closedsymbols)withmatched(continuousline)andnon-matched (dashedline)anti-faceadaptors.Notethegreateraftereffectsinducedbymatchedanti-faceadaptorsandthestrongidentityperceptassociatedwiththeotherwise identityneutralaverageface(identitystrength0)afteradaptationwithmatchedanti-faces.Reproduced,withpermission,from[40].(D–E)Anti-voiceadaptation.(D) Threevoicestimuli(briefsyllablesrepresentedbytheirspectrogram)showninatheoreticalmultidimensionalspace,withanaveragedvoiceinitscenter,andwiththeir correspondinganti-voicestimuli(onthegreencircle).(E)Psychophysicallabelingfunctionobtainedasafunctionofincreasedidentitystrengthatbaseline(no adaptation:orangesymbols)andafteradaptationwithmatched(bluesymbols)andnon-matched(pinksymbols)anti-voiceadaptors.Note,asforfaces,thegreater aftereffectsinducedbyadaptationwithmatchedanti-voiceadaptors.Reproduced,withpermission,from[39].

(7)

speakers can berepresented as points in a multidimen-sionalspace(Box2).Similartofaces,aprototypicalvoice

stimulus canbe generatedbyaveragingtogetheralarge

numberofdifferentvoicesofthesamegender.Aparticular

roleofthisprototypicalvoicehasbeenshownvia

percep-tual aftereffectsinduced by adaptationwith ‘anti-voices’

[39] in an experimental paradigmdirectly adopted from

face experiments [40]. Cerebral activity in the TVA has

recently been shown to vary as a function of a voice’s

acoustical distance to the prototypical voice [41] – i.e.,

‘‘norm-based coding’’. This is analogous to results from

the fusiform face area which showed increase in signal

with increaseddistancefromthemeanface[38,42].

Developmentandexperience

Given the importance of face and voice recognition for

intactsocialfunctioningandthespecificcomputationsthat

areneededtoextracttherichinformationthattheyconvey,

itmay notbesurprisingthat processingmechanismsfor

faces and voices appear very early in development. A

specific preference for upright faces in infants has been

found during the first 24hours after birth [43]. These

findings suggest that face processing mechanisms may

beinnateandthatearlyonface-likefiguresattract

atten-tionmorethanothernon-facestimuli[44].Similarly,there

isclearevidencethatveryyounginfants–evenfetuses–

can discriminate voices from other auditory stimuli and

can recognizetheir mother’s voice [45,46]. Bythe age of

threemonths,infantsalsopreferlisteningtohumanvoices

than tovocalizationsfromotherspecies[47].

Earlyevidenceforneuralselectiveresponsestofacesor

voicesalsoexists.Forfaces,onepositronemission

tomog-raphy(PET)studywithtwo-month-oldinfantshasshown

face-selective responses (faces > diodes) in the lateral

occipital andthe fusiformgyrus. Although the choiceof

controlstimuliwasnotideal,theseareasmaycorrespond

to the adult OFAandFFA [48]. Eventrelated potential

(ERP) studies with three-month-old infants reveal

face-selectivecomponents–theN290andN400[49,50].These

componentsemergelaterthantheadultN170andspread

overalongertimerange.Thus,face-selectiveneural

mech-anismsmayexistatearlyinfancy,butarefurther

sharp-ened during development. With respect to information

carriedbyvoices,thecontrastoffMRImeasuresofactivity

forspeechvsreversedspeechalreadyshowsanadult-like

left-lateralizedpatternat threemonths[51].Evidenceof

greater response to vocal vs non-vocal sounds seems to

emergeslightlylater,betweenthreeandsevenmonths,as

shown by near-infrared spectroscopy (NIRS) and fMRI

[52,53].Notably,newbornsalreadyexhibitaneural signa-tureforvoiceidentityrecognition[46].

Evidenceforearly,possiblyinnate,existenceoffaceand

voice selectivemechanismsdoesnotimply thattheir

de-velopment is not influenced by experience. Perceptual

narrowingduringinfancyhasbeenreportedforbothface

and speech stimuli. In particular, at six months of age

infantscanrecognizebothmonkeyandhumanfaces,but

the former ability declines by nine months, when face

recognitionbecomesbetterforhumanfaces[54,55].

Simi-larperceptualnarrowinghasbeenreportedforspeech[56].

Thelanguagespokeninone’sculturalgroupisanobvious

such influence of experience, with evidence for cerebral

mechanismstunedtothe specificsetofphonemesofthe

maternal language within the firstyear after birth(see

Box1forperceptualnarrowingofface–voiceintegration).

Non-linguistic aspects of voice perception, such as

speaker recognition,alsoseem to besusceptible to

envi-ronmentalinfluence: it is wellestablished that listeners

recognize speakers of their own or a familiar language

better than speakers of an unfamiliar language, the

languagefamiliarityeffect[57,58]andthereispartial evi-denceforapotentialeffectofraceonvoicerecognition[59].

Thisphenomenonmayparallelthewell-established‘other

raceeffect’–humans’poorabilitytorecognizefacesofother

races (e.g., Asian faces by Caucasian observers and vice

versa)[60],whichresultsfromthelittlecontactwithfaces

ofotherraces.Takentogether,evidencesuggeststhat

mech-anismsselectivefortheprocessingoffacesandvoicesappear

veryearlyindevelopmentandmayevenbeinnate.These

mechanismsarewidelytunedtoalltypesoffaceandvoice/

speechstimuliearlyon,butnarrowdownalreadybynine

monthsofageandremainnarrowlytunedtothetypeoffaces

andvoicesonehasexperiencewithalsoinadulthood.

Unexploredface–voicesimilarities

Whereas ample evidence already exists for the similar

coding of faces and voices, many phenomena that have

beendiscoveredintheextensivestudyoffacesinthepast

50yearsstill awaittesting withvoice stimuli. Crucially,

several behavioral phenomena have suggested a special

statusforfacescomparedtonon-faceobjects,butnosuch

effectsareknownforvocalstimuli.Thesewouldincludea

voicecorrelate ofthefaceinversioneffect[61]and/or the

contrastreversaleffects (stimulusmanipulationsthat

re-sultinadisproportionatelylargerecognitiondeficitrelative tonon-facestimuli[62].Anotherhallmarkoffaceprocessing

isits holisticrepresentation[63],which ismanifestedby

interactive, rather than independent, representation of

the face parts. Testing whether these well-established

face-specificeffectshavetheircounterpartsintheauditory

domainmaybeafruitfulavenueofresearch.Forinstance,

studiesusingagatingparadigmorexaminingtheeffectsof

transformationsuchastimereversalorfrequencyreversal

(or ‘rotation’ [64]) on different stimuli could potentially

highlighteffectsspecifictovocalsounds[65,66].

Other phenomenathat have beenextensively studied

withfacesarethedifferentrepresentationsoffamiliarand

unfamiliarfaces[67,68].Forexample,therepresentation

of familiar faces is more tolerant to stimulus

manipula-tions such as viewpoint or lighting changes relative to

unfamiliar faces. Also, faces are detected more rapidly

than other objects in visual scenes and search arrays

[69] and have been shown to capture attention relative

to other objects [70]. It is still unknownwhether voices

haveasimilarprivilegedstatusrelativetoother sounds.

Finally, faces automatically elicit social inferences

about the personality ofthe individual [71,72].

Interest-ingly, it has been shown that these inferences can be

clusteredintotwomainindependentinferences,

trustwor-thinessanddominance[71,72].Evidenceforasimilar

two-dimensional space that maps onto trustworthiness and

dominancehasalsobeensuggestedforvoices[73].Future

Review

(8)

studies will determine whether trustworthiness and

dominancearecorrelatedwithvoiceexpressionandvoice

gender,respectively,aswasshownforfaces[74].

Concludingremarks

Visualand auditory signalshave very different physical

properties and are processed by separate neural

sub-strates. Nevertheless, the visual andauditory pathways

do employsome similar mechanisms,including the

reti-notopicandtonotopicrepresentationsseeninearlysensory

corticesandaseparationto‘what’and‘where’pathwaysin

bothvisionandaudition[27,75].In thisreview,wehave

shownthatthetwosystemsalsoapplyverysimilar

compu-tationaloperationstotheprocessingoftheircategoriesof

overridingecologicalimportance,facesandvoices.Thisis

manifestedincategoryneuralselectivitytofacesandvoices

thatwasfoundbothinhumanandmacaquebrains,selective

cognitiveimpairments, and earlyappearance in

develop-ment. Furthermore, similar norm-based coding schemes

foridentity and attractiveness (Box3)and separate, but

interactive pathways for identity expression and speech

have been demonstrated (Table 1). Thesesimilarities, as

wellas othersthat shouldbe exploredin futuresstudies

(Box4),arelikelytocontributetoeffectiveface–voice

inte-gration(Box1),whichhasbeenshowntoresultin

recogni-tionthatexceedsthesumofeachofthestimulialone.

Box3.Areaveragedfacesandvoicesmoreattractive?

Ithasbeenshownforoveracenturythat‘averagedfaces’generated

byaveragingtogetheranumberofdifferentfacesarehighlyattractive

[90,99](FigureIA).Evolutionarytheoryproposesthataveragedfaces

aremoreattractivebecausetheycontainfeaturesthatareindicatorsof

fitness in natural faces (the ‘good genes’ account): symmetry,

averageness,texturesmoothness[100,101].Amorecognitive

expla-nationofthis phenomenon isinterms ofsimilarity tothe internal

prototype,whichresultsineasiertoprocess,morepleasantstimuli

(‘perceptualfluency’)[102].

Both the good genes and perceptual fluency accounts predict

that asimilarphenomenonshould beobservedforvoices. Bruckert

et al. [91]used morphing(Figure IB) togeneratevoice composites

madeofanincreasingnumberofvoicesandobserved,aspredicted

by face studies, a significant increase in attractiveness ratings.

Two main acoustical parameters werehighlighted, both analogous

to those shown to influence face attractiveness: distance-to-mean

(acoustical similarity with the population average); and ‘texture

smoothness’ (i.e., amount of spectro-temporal irregularities)

[91].

Notethatforbothfacesandvoices,averagenessappearstobeone

factor among manythatinfluencetheattractivenesspercept.Other

factors,suchassexualdimorphism,arealsoknowntocontributeto

both faceandvoiceattractivenessinacomplex,context-dependent

manner[103–105].

Male faces Female faces

7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7

Number of face averaged Number of face averaged Number of voices averaged

A ra cv e ness judgmen t A ra cv e ness judgmen t 1.0 0.5 0 -0.5 -1.0

Male voices Female voices

1 2 4 8 16 32 1 2 4 8 16 32 U n a ra c ve A ra c ve A ra c v e n e ss (A) (C) (B) (D)

FigureI.Faceandvoiceattractivenessjudgmentsasafunctionofaveraging.(A)Facecompositesgeneratedbyaveraging32malefaces(left)and64femalefaces (right).(B)Attractivenessratingsasafunctionofnumberoffaceaveraged.Notethesteadyincreaseinattractivenessratingswithincreasingnumberofaveragedfaces, forbothmale(left)andfemale(right)faces.Reproduced,withpermission,from[106].(C)Spectrogramsofvoicecompositesgeneratedbyaveraginganincreasingnumber ofvoicesofthesamegender(differentspeakersutteringthesyllable‘had’).(D)Attractivenessratingsasafunctionofnumberofvoicesaveraged.Notethesteadyincreasein attractivenessratingswithincreasingnumberofaveragedvoices,forbothmale(left)andfemale(right)voices.Reproduced,withpermission,from[91].

Box4.Outstandingquestions

Is the perceptual and cerebralprocessing of unfamiliar voices

differentinnaturefromthatofhighlyfamiliarvoices,ashasbeen

demonstratedforfaces?

Is there ‘holistic’ processing in representing voice?Can ‘voice

inversion’or‘voicecomposite’effectsbeobserved?

Isthethresholdforvoicedetectionlowerthanforothersound

categories?Dovoicescapturemoreattentionthanotherauditory

stimuli?

Arethereanyneural/perceptualeffectsthatarespecifictovoices

thatshouldbestudiedwithfaces?

Istheneuralsystemthatmediatesfaceprocessingmoreextensive

thantheneuralsystemthatmediatesvoiceprocessing?

(9)

Note thatthisreview haslargelyfocusedonthe

simi-laritybetweenfacesandvoices.However,thesetwostimuli

also differ in important ways. Importantly, human face

recognitionabilitiessurpasstheabilitytorecognizepeople

by voices[e.g.,76].Thismaynotbesurprising giventhe

factthat humansarehighlyvisualspecies.Whetherthis

differencereflectsamorecomplexorganizationoftheface

networkwith,forexample,moreareas(asthedataavailable

onvoiceareasinthehumanormacaquebrainsuggest)ora

lessinformativesignaltostartwith(1-dimensionalsound

frequencyvs2-dimensionalvisualspatial),orboth,remains

tobeestablished.

Acknowledgement

WewouldliketothankVadimAxelrod,JuliaSliwa,PatriciaBestelmeyer

andMarianneLatinusforusefulcomments.

References

1Belin,P.etal.(2011)Understandingvoiceperception.Br.J.Psychol. 102,711–725

2Kanwisher,N.andYovel,G.(2006)Thefusiformfacearea:acortical regionspecializedfortheperceptionoffaces.Philos.Trans.R.Soc. Lond.B:Biol.Sci.361,2109–2128

3Atkinson,A.P.andAdolphs,R.(2011)Theneuropsychologyofface perception: beyondsimple dissociationsand functional selectivity. Philos.Trans.R.Soc.Lond.B:Biol.Sci.366,1726–1738

4Tsao,D.Y.etal.(2008)Comparingfacepatchsystemsinmacaques andhumans.Proc.Natl.Acad.Sci.U.S.A.105,19514–19519

5Charest,I.etal.(2013)Cerebralprocessingofvoicegenderstudied usingacontinuouscarryoverfMRIdesign.Cereb.Cortex23,958–966

6Ethofer,T. etal.(2012)Emotionalvoiceareas:anatomic location, functional properties, and structural connections revealed by combinedfMRI/DTI.Cereb.Cortex22,191–200

7Leaver,A.M.andRauschecker,J.P.(2010)Corticalrepresentationof natural complexsounds: effectsof acoustic features and auditory objectcategory.J.Neurosci.30,7604–7612

8Moerel, M. et al. (2012) Processing of natural sounds in human auditory cortex: tonotopy, spectral tuning, and relation to voice sensitivity.J.Neurosci.32,14205–14216

9Remedios,R.etal.(2009)Anauditoryregionintheprimateinsular cortexrespondingpreferentiallytovocalcommunicationsounds.J. Neurosci.29,1034–1045

10 Romanski, L.M. (2012) Integration of faces and vocalizations in ventral prefrontal cortex: implications for the evolution of audiovisual speech. Proc. Natl. Acad. Sci. U.S.A. 109(Suppl. 1), 10717–10724

11 Romanski,L.M.etal.(2005)Neuralrepresentationofvocalizations intheprimateventrolateralprefrontalcortex.J.Neurophysiol.93, 734–747

12 Gao, Z. et al. (2012) A magnetoencephalographic study of face

processing:M170,gamma-bandoscillationsandsourcelocalization.

Hum.BrainMapp. http://dx.doi.org/10.1002/hbm.22028

13 Rossion,B.andJacques,C.(2008)Doesphysicalinterstimulusvariance accountforearlyelectrophysiologicalfacesensitiveresponsesinthe humanbrain?TenlessonsontheN170.Neuroimage39,1959–1979

14 Charest,I.etal.(2009)Electrophysiological evidencefor anearly processingofhumanvoices.BMCNeurosci.10,127

15 De Lucia, M. et al. (2010) A temporal hierarchy for conspecific vocalizationdiscriminationinhumans.J.Neurosci.30,11210–11221

16 Rogier, O. etal. (2010) An electrophysiological correlateof voice processing in 4- to 5-year-oldchildren. Int. J. Psychophysiol. 75, 44–47

17 Capilla,A.etal.(2013)Theearlyspatio-temporalcorrelatesandtask independenceofcerebralvoiceprocessingstudiedwithMEG.Cereb. Cortex23,1388–1395

18 Pitcher,D.etal.(2009)Tripledissociationoffaces,bodies,andobjects inextrastriatecortex.Curr.Biol.19,319–324

19 Sadeh, B. et al. (2011) Stimulation of category-selective brain areasmodulatesERPtotheirpreferredcategories.Curr.Biol.21, 1894–1899

20Bestelmeyer, P. et al. (2011) Right temporal TMS impairs voice detection.Curr.Biol.21,R838–R839

21Gainotti,G.(2013)Lateralityeffectsinnormalsubjects’recognitionof familiar faces, voicesand names.Perceptualandrepresentational components.Neuropsychologia51,1151–1160

22Yovel,G.etal.(2008)Theasymmetryofthefusiformfaceareaisa stable individual characteristic thatunderlies the left-visual-field superiorityforfaces.Neuropsychologia46,3061–3068

23Freiwald,W.A.andTsao,D.Y.(2010)Functionalcompartmentalization and viewpoint generalization within the macaque face-processing system.Science330,845–851

24Yovel, G.and Freiwald, W.A. (2013)Face recognitionsystems in

monkey andhuman:aretheythesamething?F1000PrimeReport

5http://dx.doi.org/10.12703/P5-10

25Petkov, C.I.etal.(2008)Avoiceregioninthemonkeybrain.Nat. Neurosci.11,367–374

26Perrodin,C.etal.(2011)Voicecellsintheprimatetemporallobe. Curr.Biol.21,1408–1415

27Kravitz,D.J.etal.(2011)Anewneuralframeworkforvisuospatial processing.Nat.Rev.Neurosci.12,217–230

28Rauschecker, J.P.andScott,S.K.(2009)Mapsandstreamsinthe auditory cortex: nonhuman primates illuminate human speech processing.Nat.Neurosci.12,718–724

29Barton,J.J.(2008)Structureandfunctioninacquiredprosopagnosia: lessons from a series of 10 patients with brain damage. J. Neuropsychol.2,197–225

30Susilo, T. and Duchaine, B. (2012) Advances in developmental

prosopagnosia research. Curr. Opin. Neurobiol. http://dx.doi.org/

10.1016/j.conb.2012.12.011

31Hailstone,J.C.etal.(2010)Progressiveassociativephonagnosia:a neuropsychologicalanalysis.Neuropsychologia48,1104–1114

32Neuner, F. and Schweinberger, S.R. (2000) Neuropsychological impairments in the recognition of faces, voices, and personal names.BrainCogn.44,342–366

33Sidtis,D.andKreiman,J.(2012)Inthebeginningwasthefamiliar voice:personallyfamiliarvoicesintheevolutionaryandcontemporary biologyofcommunication.Integr.Psychol.Behav.Sci.46,146–159

34Garrido, L. et al. (2009) Developmental phonagnosia: a selective deficitofvocalidentityrecognition.Neuropsychologia47,123–131

35von Kriegstein,K.etal.(2005)Interactionoffaceandvoiceareas duringspeakerrecognition.J.Cogn.Neurosci.17,367–376

36Young,A.W.andBruce,V.(2011)Understandingpersonperception. Br.J.Psychol.102,959–974

37Valentine,T.(1991)Aunifiedaccountoftheeffectsofdistinctiveness, inversion, and raceinface recognition. Q.J. Exp.Psychol. A43, 161–204

38Leopold, D.A. et al. (2006) Norm-based face encoding by single neuronsinthemonkeyinferotemporalcortex.Nature442,572–575

39Latinus, M. and Belin, P. (2011) Anti-voice adaptation suggests prototype-basedcodingofvoiceidentity.Front.Psychol.2,175

40Leopold, D.A. et al. (2001) Prototype-referenced shape encoding revealedbyhigh-levelaftereffects.Nat.Neurosci.4,89–94

41Latinus, M. etal. Norm-basedcoding of voiceidentity in human

auditorycortex.CurrentBiology,inpress

42Loffler,G.(2005)fMRievidencefortheneuralrepresentationoffaces. Nat.Neurosci.8,1386–1390

43Simion, F. et al. (2011)The processing of social stimuliin early infancy: from faces to biological motion perception. Prog. Brain Res.189,173–193

44Johnson,M.H.(2005)Subcorticalfaceprocessing.Nat.Rev.Neurosci. 6,766–774

45Kisilevsky, B.S. et al. (2003) Effects of experience on fetal voice recognition.Psychol.Sci.14,220–224

46Beauchemin, M. et al. (2011) Mother and stranger: an electrophysiological studyof voice processingin newborns.Cereb. Cortex21,1705–1711

47Vouloumanos, A. et al. (2010) The tuning of human neonates’ preferenceforspeech.ChildDev.81,517–527

48Tzourio-Mazoyer,N.etal.(2002)Neuralcorrelatesofwomanface processingby2-month-oldinfants.Neuroimage15,454–461

49de Haan, M. et al. (2003) Development of face-sensitive event-relatedpotentials during infancy:a review. Int.J. Psychophysiol. 51,45–58

Review

(10)

50 Halit,H.etal.(2003)Corticalspecialisationforfaceprocessing: face-sensitiveevent-relatedpotentialcomponentsin3-and12-month-old infants.Neuroimage19,1180–1193

51 Dehaene-Lambertz, G. et al. (2002) Functional neuroimaging of speechperceptionininfants.Science298,2013–2015

52 Grossman, T. et al. (2010) The developmental origins of voice processinginthehumanbrain.Neuron65,852–858

53 Blasi, A. etal. (2011) Early specialization for voice and emotion processingintheinfantbrain.Curr.Biol.21,1220–1224

54 Pascalis,O.etal.(2002)Isfaceprocessingspecies-specificduringthe firstyearoflife?Science296,1321–1323

55 Di Giorgio,E. etal.(2012) Istheface-perception system human-specificatbirth?Dev.Psychol.48,1083–1090

56 Kuhl, P.K. et al. (1992) Linguistic experience alters phonetic perceptionininfantsby6monthsofage.Science255,606–608

57 Perrachione, T.K. and Wong, P.C. (2007) Learning to recognize speakersofa non-nativelanguage:implicationsfor thefunctional organization of human auditory cortex. Neuropsychologia 45, 1899–1910

58 Winters, S.J. et al. (2008) Identification and discrimination of bilingualtalkersacrosslanguages.J.Acoust.Soc.Am.123,4524–4538

59 Perrachione, T.K. et al. (2010) Asymmetric cultural effects on perceptualexpertiseunderlieanown-racebiasforvoices.Cognition 114,42–55

60 Young, S.G. et al. (2012) Perception and motivation in face recognition: a criticalreview of theories oftheCross-Race Effect. Pers.Soc.Psychol.Rev.16,116–142

61 Yin,R.K.(1969)Lookingatupside-downfaces.J.Exp.Psychol.81, 141–145

62 Nederhouser, M. et al. (2007) The deleterious effect of contrast reversalon recognitionisuniquetofaces, notobjects.VisionRes. 47,2134–2142

63 DeGutis,J.etal.(2013)Usingregressiontomeasureholisticface processing reveals a strong link with face recognition ability. Cognition126,87–100

64 Blesser, B. (1972)Speech perception under conditionsof spectral transformation.I.Phoneticcharacteristics.J.SpeechHear.Res.15, 5–41

65 Bedard,C.andBelin,P.(2004)A‘voiceinversioneffect’?BrainCogn. 55,247–249

66 Agus,T.R.etal.(2012)Fastrecognitionofmusicalsoundsbasedon timbre.J.Acoust.Soc.Am.131,4124–4133

67 Bruce, V. et al. (1993) Effects of distinctiveness, repetition and semantic priming on the recognition of face familiarity. Can. J. Exp.Psychol.47,38–60

68 Natu,V.andO’Toole,A.J.(2011)Theneuralprocessingoffamiliar and unfamiliar faces:a review and synopsis.Br.J.Psychol. 102, 726–747

69 Hershler,O.etal.(2010)Thewidewindowoffacedetection.J.Vis.10, 21

70 Langton,S.R.etal.(2008)Attentioncapturebyfaces.Cognition107, 330–342

71 Oosterhof,N.N.andTodorov,A.(2008)Thefunctionalbasisofface evaluation.Proc.Natl.Acad.Sci.U.S.A.105,11087–11092

72 Willis,J.andTodorov,A.(2006)Firstimpressions:makingupyour mindaftera100msexposuretoaface.Psychol.Sci.17,592–598

73 Scherer,K.R.(1972)Judgingpersonalityfromvoice:across-cultural approach to anoldissuein interpersonal perception.J.Pers. 40, 191–210

74 Engell, A.D. et al. (2010) Common neural mechanisms for the evaluation of facialtrustworthiness andemotional expressionsas revealedbybehavioraladaptation.Perception39,931–941

75 Rauschecker,J.P.andTian,B.(2000)Mechanismsandstreamsfor processingof‘what’and‘where’inauditorycortex.Proc.Natl.Acad. Sci.U.S.A.97,11800–11806

76 Stevenage,S.V.etal.(2013)Theeffectofdistractiononfaceandvoice recognition.Psychol.Res.77,167–175

77 Belin, P. et al., eds (2012) Integrating face and voice in person perception,Springer

78 Schweinberger, S.R. et al. (2011) Hearing facial identities: brain correlatesofface–voiceintegrationinperson identification.Cortex 47,1026–1037

79 Smith,E.L.etal.(2007)Auditory-visualcrossmodalintegration in perceptionoffacegender.Curr.Biol.17,1680–1685

80 deGelder,B.andVroomen,J.(2000)Theperceptionofemotionsby earandbyeye.Cogn.Emot.14,289–311

81 Pourtois,G.etal.(2005)Perceptionoffacialexpressionsandvoices andoftheircombinationinthehumanbrain.Cortex41,49–59

82 Hagan, C.C. et al. (2009) MEG demonstrates a supra-additive responsetofacialandvocalemotionintherightsuperiortemporal sulcus.Proc.Natl.Acad.Sci.U.S.A.106,20010–20015

83 Patterson, M.L. and Werker, J.F. (2003) Two-month-old infants matchphoneticinformationinlipsandvoice.Dev.Sci.6,191–196

84 Lewkowicz, D.J. and Ghazanfar, A.A. (2009) The emergence of multisensory systems through perceptual narrowing. Trends Cogn.Sci.13,470–478

85 Grossmann,T.etal.(2012)Neuralcorrelatesofperceptualnarrowing incross-speciesface-voicematching.Dev.Sci.15,830–839

86 Haxby,J.V.etal.(2000)Thedistributedhumanneuralsystemforface perception.TrendsCogn.Sci.4,223–233

87 Belin,P.etal.(2000)Voice-selectiveareasinhumanauditorycortex. Nature403,309–312

88 Jeffery,L.etal.(2011)Distinguishingnorm-basedfrom exemplar-based coding of identity in children: evidence from face identity aftereffects.J.Exp.Psychol.Hum.Percept.Perform.37,1824–1840

89 Lavner,Y.etal.(2001)Theprototypemodelinspeakeridentification byhumanlisteners.Int.J.SpeechTechnol.4,63–74

90 Langlois,J.H.andRoggman, L.A.(1990)Attractivefacesareonly average.Psychol.Sci.1,115–121

91 Bruckert,L.etal.(2010)Vocalattractivenessincreasesbyaveraging. Curr.Biol.20,116–120

92 O’Neil,S.F.andWebster,M.A.(2011)Adaptationandtheperception offacialage.Vis.Cogn.19,534–550

93 Tillman,M.A.andWebster,M.A.(2012)Selectivityoffacedistortion aftereffectsfordifferencesinexpressionorgender.Front.Psychol.3, 14

94 Webster,M.A.andMacLeod,D.I.(2011)Visualadaptationandface perception.Philos.Trans.R.Soc.Lond.B:Biol.Sci.366,1702–1725

95 Schweinberger, S.R. et al. (2008) Auditory adaptation in voice perception.Curr.Biol.18,684–688

96 Za¨ske, R.etal. (2010)Voiceaftereffects of adaptationtospeaker identity.Hear.Res.268,38–45

97 Latinus,M.andBelin,P.(2012)Perceptualauditoryaftereffectson voiceidentityusingbriefvowelstimuli.PLoSONE7,e41384

98 Bestelmeyer, P. et al. (2010) Auditory adaptation in vocal affect perception.Cognition117,217–223

99 Galton,F.(1878)Compositeportraits.J.Anthropol.Inst.8,132–144

100Grammer,K.etal.(2003)Darwinianaesthetics:sexualselectionand thebiologyofbeauty.Biol.Rev.Camb.Philos.Soc.78,385–407

101Little, A.C.etal. (2011)Facialattractiveness:evolutionarybased research.Philos.Trans.R.Soc.Lond.B:Biol.Sci.366,1638–1659

102Winkielman,P.etal.(2006)Prototypesareattractivebecausethey areeasyonthemind.Psychol.Sci.17,799–806

103Feinberg,D.R.etal.(2008)Theroleoffemininityandaveragenessof voicepitchinaestheticjudgmentsofwomen’svoices.Perception37, 615–623

104Said, C.P. and Todorov, A. (2011) A statistical model of facial attractiveness.Psychol.Sci.22,1183–1190

105DeBruine, L.M. et al. (2007) Dissociating averageness and attractiveness: attractive faces are not always average. J. Exp. Psychol.Hum.Percept.Perform.33,1420–1430

106Braun, C. et al. (2001) Beautycheck -Ursachen und Folgenvon

Attraktivita¨t. Projektabschlussbericht. http://www.beautycheck.de/

cmsms/index.php/der-ganze-bericht