Seeing our own voice: an electrophysiological study of audiovisual speech integration during self perception

(1)

HAL Id: hal-01297672

https://hal.archives-ouvertes.fr/hal-01297672

Submitted on 5 Apr 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Seeing our own voice: an electrophysiological study of audiovisual speech integration during self perception

Avril Treille, Coriandre Vilain, Sonia Kandel, Marc Sato

To cite this version:

Avril Treille, Coriandre Vilain, Sonia Kandel, Marc Sato. Seeing our own voice: an electrophysiological study of audiovisual speech integration during self perception. IMRF 2015 - 16th International Multisensory Research Forum, Jun 2015, Pise, Italy. �hal-01297672�

(2)

SEEING OUR OWN VOICE: AN ELECTROPHYSIOLOGICAL STUDY OF AUDIOVISUAL SPEECH INTEGRATION DURING SELF PERCEPTION

Avril Treille

¹

, Coriandre Vilain

¹

, Sonia Kandel

¹

& Marc Sato

²

1 GIPSA-lab, Département Parole & Cognition, CNRS & Grenoble Université, Grenoble, France

2Laboratoire Parole & Langage, CNRS & Aix Marseille Université, France

Introduction

To recognize one's own face and voice is key for our self-awareness and for our ability to communicate effectively with others. Interestingly, recent studies suggest that better recognition of one's actions may result from the integration of sensory inputs with our own sensory-motor knowledge. However, whether hearing our voice and seeing our articulatory gestures facilitate audiovisual speech integration is still debated.

 Participants

18 healthy adults, right-handed native French speakers.

 Stimuli

 Syllables : /pa/, /ta/, /ka/

 Modalities : auditory (A), visual (V), audio-visual (AV) and incongruent audio-visual (AVi, self auditory signal, other visual signal)

 Half of the stimuli were related to the participant (self condition), the other half to an unknown speaker (other condition).

 A total of 1176 stimuli were created

 Tasks

1) Before experiment, a short training was performed.

3) EEG session : a three-alternative forced-choice identification task, with participants instructed to categorize each perceived syllable with their right hand, after an audio “beep”.

 Data acquisition

 EEG data were continuously recorded from 64 scalp electrodes (international 10–20 system) using the Biosemi ActiveTwo AD-box EEG system operating at a sampling rate of 256 Hz.

 Two additional electrodes served as reference [CMS] & [DRL]

 One other external reference electrode was at the top of the nose. The electrooculogram controlling for horizontal (HEOG) and vertical (VEOG) eye movements were recorded using electrodes at the outer canthus of each eye as well as above and below the right eye. Before the experiment, the impedance of each electrode was adjusted to get low offset voltage and stable DC.

 Analysis

 Behavioral analyses:

 EEG analyses on fronto-central electrodes (F3/F4/C3/C4/Fz/Cz):

 Correlation between EEG and behavioral data

Correlations between Integration (EEG; AV- A+V) & Visual identification (%)

Methods

Results

Discussion

1) Behavioral results : All modality was perfectly perceived except in the visual modality. /pa/ was better perceived than /ta/ and /ka/ during visual presentation.

2) Integration for both self and other signals : Early integration processing (on P2) during AV-self and AV-other speech perception compared to A+V.

3) Speaker effect on N1 latency : Compared to Visual-other, Visual-self stimuli induced a temporal facilitation on N1 during integration mechanisms.

4) Correlations on N1 latency for self visually ambiguous syllables : A negative correlation was oberved between visual-self identificcation and integration results on N1.

=> In line with previous EEG studies on multimodal speech perception, our results point to the existence of early integration mechanisms of auditory and visual speech information. Crucially, they also provide evidence for a processing advantage when the perceptual situation involves our own speech productions mostly for visually ambiguous syllables. Viewing our own utterances leads to a temporal facilitation of the integration of auditory and visual speech signals.

Behavioral

% ANOVA : Modality (A, AV, V, AVi), speaker (Self/other), syllables (/pa/, /ta/, /ka/)

Pre-

processing

- Re-referenced off-line to the nose - Filtering : 2-30 Hz - Epochs : 1000ms (baseline from -500 to -400ms - Rejection : ±60 µV to the acoustic syllable onset)

NI & P2 Amplitude Latency

Speaker’s effect

ANOVA : Auditory modality (Self/other), Visual modality (Self/other/None)

ANOVA : : Auditory modality (Self/other), Visual modality (Self/other/None)

Audio-visual integration

ANOVA : Signal type (Bimodal/Sum), Auditory modality (Self/Other), Visual modality (Self/Other)

Part of this research was supported by a grant from the European Research Council (FP7/2007-2013 Grant Agreement no. 339152, "Speech Unit(e)s") Correspondence: avril.treille@gipsa-lab.inpg.fr

1) Behavioral - % correct responses

(p<.001)

3) EEG – Self effect on integration 2) EEG - Integration (AV <> A+V)

4) Correlations between Integration (EEG; AV- A+V)

& Visual identification (%)

102 104 106 108 110 112 114 116 118

Visual-self Visual-other N1 Latency – Visual effect

0 1 2 3 4 5 6

Sum Bimodality

P2 Amplitude – Type effect

-7 -2 3

50% 60% 70% 80% 90% 100%

Correlation - N1 Amplitude

V-self V-other

-7 -2 3

50% 60% 70% 80% 90% 100%

Correlation – P2 Amplitude

V-self V-other

-50 -30 -10 10 30 50

50% 60% 70% 80% 90% 100%

Correlation - N1 Latency

V-self V-other

-50 -30 -10 10 30 50

50% 60% 70% 80% 90% 100%

Correlation – P2 Latency

V-self V-other

- Amplitude N1 & P2 : No efffect - Latence P2 : No effetc

- Latence N1 : Self : r=.41, p<.02; other : r=.01, p<.94

NS

NS NS

r=.41, p<.02

• Self : N1 latency is negatively correlated with visual saliency

60%

65%

70%

75%

80%

85%

90%

95%

100%

A-self A-other AV-self AV-other AVi-self Avi-other V-self V-other

% of correct responses

Pa Ta Ka

• A = AV = Avi > V

• /pa/ = /ta/ = /ka/

except for V-self and V-other

• No effect of the speaker

-5 -4 -3 -2 -1 0 1 2 3 4 5

-500 -453 -406 -359 -313 -266 -219 -172 -125 -78 -31 16 63 109 156 203 250 297 344 391 438 484

AsVs As+Vs

-5 -4 -3 -2 -1 0 1 2 3 4 5

-500 -453 -406 -359 -313 -266 -219 -172 -125 -78 -31 16 63 109 156 203 250 297 344 391 438 484

AoVs Ao+Vs

-5 -4 -3 -2 -1 0 1 2 3 4 5

-500 -453 -406 -359 -313 -266 -219 -172 -125 -78 -31 16 63 109 156 203 250 297 344 391 438 484

AsVo As+Vo

-5 -4 -3 -2 -1 0 1 2 3 4 5

-500 -453 -406 -359 -313 -266 -219 -172 -125 -78 -31 16 63 109 156 203 250 297 344 391 438 484

AoVo Ao+Vo

• Visual-Self : reduced N1 latency (p<.02)

• P2 amplitude : AV < A+V (p<. 02)

• => integration