• Aucun résultat trouvé

Electrophysiological evidence for audio-visuo-lingual speech integration

N/A
N/A
Protected

Academic year: 2021

Partager "Electrophysiological evidence for audio-visuo-lingual speech integration"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-01297678

https://hal.archives-ouvertes.fr/hal-01297678

Submitted on 5 Apr 2016

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Electrophysiological evidence for audio-visuo-lingual speech integration

Coriandre Vilain, Avril Treille, Marc Sato

To cite this version:

Coriandre Vilain, Avril Treille, Marc Sato. Electrophysiological evidence for audio-visuo-lingual speech integration. IMRF 2015 - 16th International Multisensory Research Forum, Jun 2015, Pise, Italy. �hal- 01297678�

(2)

ELECTROPHYSIOLOGICAL EVIDENCE FOR AUDIO-VISUO-LINGUAL SPEECH INTEGRATION

Coriandre Vilain

1

, Avril Treille

1

& Marc Sato

2

1 GIPSA-lab, Département Parole & Cognition, CNRS & Grenoble Université, Grenoble, France

2Laboratoire Parole & Langage, CNRS & Aix Marseille Université, France

Introduction

Audio-visual speech perception is a special case of multisensory processing that interfaces with the linguistic system. One important issue is whether cross-modal interactions only depend on well-known auditory and visuo-facial modalities or, rather, might also be triggered by other sensory sources less common in speech communication. The present EEG study aimed at investigating cross-modal interactions not only between auditory, visuo-facial and audio-visuo-facial syllables but also between auditory, visuo-lingual and audio-visuo-lingual syllables.

Participants

18 healthy adults, right-handed native French speakers.

Stimuli

/pa/, /ta/ and /ka/ syllables were individually recorded by one male and one female speakers in a sound-proof room. Two /pa/, /ta/

and /ka/ tokens were selected per speaker .

Synchronous recordings of auditory, visual and ultrasound lingual signals were acquired by a Terason T3000 ultrasound system (UltraSpeech system, Hueber et al. 2008) .

Method

1) Before the experiment, short explanations on the ultrasound system and tongue movements required for the production of /pa/, /ta/ and /ka/ syllables were given.

2) Behavioral session: a three-alternative forced-choice identification task, with participants instructed to categorize as quickly as possible each perceived syllable with their right hand.

3) EEG session: a three-alternative forced-choice identification task, with participants instructed to categorize each perceived syllable with their right hand, after an audio “beep”.

Data acquisition

EEG data were continuously recorded from 64 scalp electrodes (international 10–20 system) using the Biosemi ActiveTwo AD-box EEG system operating at a sampling rate of 256 Hz.

Two additional electrodes served as reference [CMS] & [DRL]

One other external reference electrode was at the top of the nose. The electrooculogram controlling for horizontal (HEOG) and vertical (VEOG) eye movements were recorded using electrodes at the outer canthus of each eye as well as above and below the right eye. Before the experiment, the impedance of each electrode was adjusted to get low offset voltage and stable DC.

Analysis

Behavioral analyses (1):

EEG analyses on fronto-central electrodes (F3/F4/C3/C4/Fz/Cz):

METHODS

60 stimuli were created consisting of 12 /pa/, /ta/ and /ka/ syllables related to 5 conditions: an auditory condition (A), 2 visual (VL, VT) and 2 audio-visual (AVL, AVT) conditions related to either lip or tongue movements of a speaker.

RE SUL TS

Discussion

1) Lips > tongue > chance : For the visual-only modality, syllables are perceived correctly even when they refer to tongue movement.

2) Noise vs No Noise : In line with previous studies, adding noise reduces and shifts the N1/P2 peaks.

3) Integration : Early integration processing (on P2) during AV-lips and AVT-tongue speech perception compared to A+V.

4) Difference between lips and tongue audio-visual speech perception : an amplitude difference (N1 is smaller and P2 is larger than N1/P2 observed during AV-lips perception)

=> Although participants were not experienced with visuo-lingual stimuli, our results demonstrate that they were able to recognize them and provide the first evidence for audio-visuo-lingual speech interactions. These results further emphasize the multimodal nature of speech perception and likely reflect the impact of listener's knowledge of speech production on speech perception.

Behavioral experiment EEG experiment

% ANOVA : Environment (Noise/No Noise), Articulator (Lips/Tongue), Modality (A, AV, V)

ANOVA : Environment (Noise/No Noise), Articulator (Lips/Tongue), Modality (A, AV, V)

RT’s ANOVA : Environment (Noise/No Noise), Articulator (Lips/Tongue), Modality (A, AV, V)

Pre-

processing

- Re-referenced off-line to the nose - Filtering : 1-30 Hz - Epochs : 1000ms (baseline from -500 to -400ms - Rejection : ±60 µV to the acoustic syllable onset)

1) Behavioral results 2) EEG – Noise vs No Noise

3) EEG – Integration (AV <> A+V) 4) EEG – Lips vs Tongue

% Correct Response :

Main Effects: - Noise < No noise (p<.0001) - Lips > tongue (p<.005)

- AV > A = V (p<.0001)

Interactions: - Noise: AV = V > A

No noise: AV=A > V (p<. 0001) - Lips: A=V < AV

Tongue: AV=A > V (p<.05)

Noise causes:

Reduced amplitude (p<.005)

Delayed Latency (p<.0001)

-4 -3 -2 -1 0 1 2 3 4

-500 -469 -438 -406 -375 -344 -313 -281 -250 -219 -188 -156 -125 -94 -63 -31 0 31 63 94 125 156 188 219 250 281 313 344 375 406 438 469

N1/P2 - average (AL, AT, AVL, AVT)

Noise No noise N1

N1

P2

P2

Part of this research was supported by a grant from the European Research Council (FP7/2007-2013 Grant Agreement no. 339152, "Speech Unit(e)s") Correspondence: avril.treille@gipsa-lab.inpg.fr

2 2,5 3 3,5 4

A+V AV

P2 Amplitude (μV)

245 250 255 260

A+V AV

P2 Latency (ms)

** **

N1 : No effect P2 : Reduced

amplitude (p<.0001) and latency (p<.007) in AV compared to A+V for both lip and tongue signals

-1,5 -1 -0,5 0 0,5 1 1,5

-500 -469 -438 -406 -375 -344 -313 -281 -250 -219 -188 -156 -125 -94 -63 -31 0 31 63 94 125 156 188 219 250 281 313 344 375 406 438 469

N1/P2 - average (AL, AVL, AT, AVT in noise and no noise conditions)

Lips Tongue N1

P2

For tongue (compared to lips) :

Reduced N1 amplitude (p<.01)

Larger P2 amplitude (p<.0001)

1) N1 and P2 latency and amplitude analysis by ANOVA with Environment (Noise/No noise), Articulator (Lips/Tongue) and Modality (A, AV) as intra-subject variable (main effects and interactions)

2) N1 and P2 latency and amplitude were also tested for integration, i.e. an ANOVA with Environment (Noise/No noise), Articulator (Lips/Tongue), Signal type (Bimodal (AV)/Sum (A+V)) as intra-subject variable (main effects and interactions)

-1,50 -1,00 -0,50 0,00 0,50 1,00 1,50 2,00

-500 -473 -445 -418 -391 -363 -336 -309 -281 -254 -227 -199 -172 -145 -117 -90 -63 -35 -8 20 47 74 102 129 156 184 211 238 266 293 320 348 375 402 430 457 484

A+V AV

P2 N1

40%

50%

60%

70%

80%

90%

100%

AL AVL VL AT AVT VT

% of correct responses

Noise Nonoise

0,20 0,30 0,40 0,50 0,60 0,70 0,80 0,90 1,00 1,10 1,20

AL AVL VL AT AVT VT

Reaction Time (ms)

Noise Nonoise

Amplitude P2 : A+V : Lips (2,91uV); Tongue (2,36uV) AV : Lips (3,92uV); Tongue (2,94uV) Latency P2 : A+V : Lips (258ms); Tongue (248ms) AV : Lips (248ms); Tongue (244ms)

TR :

Main Effects: - Noise > No Noise (p<.03) - A=AV < V (p<.0001)

Interactions: - Noise: A = AV = V

No noise: A = AV < V (p<.0001) - Lips: A = V > AV

Tongue: A = AV < V (p<.01)

Simultaneous recording : left : tongue movements with ultrasound system;

right : Lips movements with camera

Références

Documents relatifs

Speech Communication – Special Issue – Silent Speech Interface 5 Our study is based on a two-speaker (one male, one female, native speakers of American English),

It is thus interesting to test the influence of MPEG audio coding algorithms on speech recognition performance.. Moreover, MPEG audio coding supports a variable bitrate, which allows

Based on the analysis of systems application status in the agricultural enterprise of integrative production and marketing integration, an ESB-SOA model of application integration

These stimuli have been played in a audiovisual perception test in four different conditions: audio signal alone (AU), audio signal + cutaway view of the virtual

between static muscle proprioceptive information and moving tactile and/or visual information. Such

The average difference between a position transduced by Movetrack and the actual position, obtained from a precision mechanical device, was less than 0.4 mm, when the receiver

This work was aimed at nding out whether such a subdivision into independent components may also be observed in patterns of tongue-palate contacts, as revealed by

First and in line with previous EEG studies on audiovisual speech integration, we observed an amplitude decrease on P2 auditory evoked potentials during the bimodal