Orofacial somatosensory inputs enhance speech intelligibility in noisy environments

(1)

HAL Id: hal-03083564

https://hal.archives-ouvertes.fr/hal-03083564

Submitted on 19 Dec 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Orofacial somatosensory inputs enhance speech intelligibility in noisy environments

Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito

To cite this version:

Rintaro Ogane, Jean-Luc Schwartz, Takayuki Ito. Orofacial somatosensory inputs enhance speech intelligibility in noisy environments. ISSP 2020 - 12th International Seminar on Speech Production, Dec 2020, Providence (virtual), United States. �hal-03083564�

(2)

Discussion Results

Orofacial somatosensory inputs enhance

speech intelligibility in noisy environments

Rintaro Ogane ¹ , Jean-Luc Schwartz ¹ , Takayuki Ito ^1,2

1 **Univ. Grenoble Alpes, CNRS, Grenoble INP*, GIPSA-lab, Grenoble, France, ² Haskins Laboratories, New Haven, USA**

* Institute of Engineering Univ. Grenoble Alpes

Introduction

This work was supported by the European Research Council under the European Community's Seventh Framework Program (FP7/2007-2013 Grant Agreement no. 339152) and the National Institute on Deafness and Other Communication Disorders R01DC017439.

We thank Gurvan Quiniou for data collection and analysis, and Silvain Gerber for statistical analysis. We also thank Coriandre Vilain for his technical support.

Acknowledgments

Somatosensory inputs associated with facial skin deformation enhance speech

intelligibility in noise, when the somatosensory stimulation is compatible with the articulatory nature of the corresponding speech sound.

The orofacial somatosensory system may intervene in the process of speech detection in noisy environments.

Summary

Methods

Data analysis.

− Mean probability of correct response rate across all SNR conditions.

Auditory processing system

auditory-visual interaction

Somatosensory information

Visual information

auditory-somatosensory interaction

How does somatosensory input affect the processing of speech ?

Do somatosensory inputs associated with facial skin deformation enhance

speech intelligibility in noise ?

Speech perception is an interactive process with multiple modalities and some perceptuo(multisensory)-motor connections

(Schwartz et al., 2012)

.

Experimental setup (Ito et al., 2009)

Speech intelligibility in noise was increased in SKIN compared to CTL.

− ≈ 3% increased for speech target /pa/.

− Somatosensory effect is consistent with audio-visual speech processing.

Participants : 22 native French speakers.

− 14 for Exp. 1 and 8 for Exp. 2.

Speech materials.

− /pa/ for Exp. 1 and /py/ for Exp. 2.

Speech detection test.

− Task : to identify which noise period includes the target speech sound ?

− Speech stimulus was embedded in

background noises (80 dB of SPL) with 8 SNR levels.

-8 dB to -15 dB for target /pa/.

-10 dB to -17 dB for target /py/.

− Two experimental conditions were alternated every 8 trials.

SKIN : with somatosensory stimulation.

CTL : auditory-alone.

Do somatosensory inputs provide different effects in different types of

auditory stimulation ? Q. 1

Q. 2

Somatosensory stimulation on the face (SKIN).

− Upward direction.

− A half-wave 6 Hz sinusoidal pattern.

− Applied in both noise periods.

− The timing was adjusted to match the peak amplitude between somatosensory and auditory stimuli.

Somatosensory effect may appear when the somatosensory stimulation is matched with articulatory gesture in speech sound

(Ogane et al., 2019; 2020)

.

Target /pa/ Target /py/

Contact : rintaro.ogane@gipsa-lab.grenoble-inp.fr

McGurk effect (McGurk & MacDonald, 1976). Word segmentation (Sell & Kaschak, 2009).

Lexical processing in French (Strauß et al., 2015).

Speech detection in noise (Sumby & Pollack, 1954; Erber, 1969; Grant & Seitz, 2000; Bernstein et al., 2004; Kim & Davis, 2004).

Note: two onsets of target speech sound were applied to avoid the participant’s anticipation.

Audio-somatosensory detection advantage

Correct response rate.

Mean probability of correct response rate.

Statistical results.

− Mean probability for /pa/ was significantly different from zero

(t(13) = 2.33, p = 0.036, one sample t-test).

− No significant difference was found for /py/ ()**

(t(7) = 1.77, p > 0.12, one sample t-test).

A relationship between somatosensory stimulation & articulatory gesture in auditory stimulation.

− Speech intelligibility increased only when the speech target was /pa/.

Somatosensory information (this experiment)

Speech intelligibility in noise

Visual information

(Bernstein et al., 2004; Schwartz et al., 2004;

Sumby and Pollack, 1954)

Additional sensory inputs associated with auditory information increase the intelligibility of speech sounds in noisy environment.

Vowel perception (Ito et al., 2009, Trudeau-Fisette et al., 2017). Lexical perception (Ogane et al., 2019; 2020).

Orofacial somatosensory inputs enhance speech intelligibility in noisy environments

Discussion Results

Orofacial somatosensory inputs enhance

speech intelligibility in noisy environments

Rintaro Ogane 1 , Jean-Luc Schwartz 1 , Takayuki Ito 1,2

1 Univ. Grenoble Alpes, CNRS, Grenoble INP*, GIPSA-lab, Grenoble, France, 2 Haskins Laboratories, New Haven, USA

* Institute of Engineering Univ. Grenoble Alpes

Introduction

Acknowledgments

Somatosensory inputs associated with facial skin deformation enhance speech

intelligibility in noise, when the somatosensory stimulation is compatible with the articulatory nature of the corresponding speech sound.

The orofacial somatosensory system may intervene in the process of speech detection in noisy environments.

Summary

Methods

Data analysis.

− Mean probability of correct response rate across all SNR conditions.

Auditory processing system

auditory-visual interaction

Somatosensory information

Visual information

auditory-somatosensory interaction

How does somatosensory input affect the processing of speech ?

Do somatosensory inputs associated with facial skin deformation enhance

speech intelligibility in noise ?

Speech perception is an interactive process with multiple modalities and some perceptuo(multisensory)-motor connections

.

Experimental setup (Ito et al., 2009)

Speech intelligibility in noise was increased in SKIN compared to CTL.

− ≈ 3% increased for speech target /pa/.

− Somatosensory effect is consistent with audio-visual speech processing.

Participants : 22 native French speakers.

− 14 for Exp. 1 and 8 for Exp. 2.

Speech materials.

− /pa/ for Exp. 1 and /py/ for Exp. 2.

Speech detection test.

− Task : to identify which noise period includes the target speech sound ?

− Speech stimulus was embedded in

background noises (80 dB of SPL) with 8 SNR levels.

-8 dB to -15 dB for target /pa/.

-10 dB to -17 dB for target /py/.

− Two experimental conditions were alternated every 8 trials.

SKIN : with somatosensory stimulation.

CTL : auditory-alone.

Do somatosensory inputs provide different effects in different types of

auditory stimulation ? Q. 1

Q. 2

Somatosensory stimulation on the face (SKIN).

− Upward direction.

− A half-wave 6 Hz sinusoidal pattern.

− Applied in both noise periods.

− The timing was adjusted to match the peak amplitude between somatosensory and auditory stimuli.

Somatosensory effect may appear when the somatosensory stimulation is matched with articulatory gesture in speech sound

.

Target /pa/ Target /py/

Contact : rintaro.ogane@gipsa-lab.grenoble-inp.fr

Note: two onsets of target speech sound were applied to avoid the participant’s anticipation.

Audio-somatosensory detection advantage

Correct response rate.

Mean probability of correct response rate.

Statistical results.

− Mean probability for /pa/ was significantly different from zero

(t(13) = 2.33, p = 0.036, one sample t-test).

− No significant difference was found for /py/ (**)

(t(7) = 1.77, p > 0.12, one sample t-test).

A relationship between somatosensory stimulation & articulatory gesture in auditory stimulation.

− Speech intelligibility increased only when the speech target was /pa/.

Somatosensory information (this experiment)

Speech intelligibility in noise

Visual information

Additional sensory inputs associated with auditory information increase the intelligibility of speech sounds in noisy environment.

Upward

somatosensory stimulation

Articulatory gesture to produce /pa/

Articulatory gesture to produce /py/

Target /pa/ Target /py/

(**) to be confirmed with an increased number of participants

for "py" (delayed by COVID).

Rintaro Ogane ¹ , Jean-Luc Schwartz ¹ , Takayuki Ito ^1,2

1 **Univ. Grenoble Alpes, CNRS, Grenoble INP*, GIPSA-lab, Grenoble, France, ² Haskins Laboratories, New Haven, USA**

− No significant difference was found for /py/ ()**

() to be confirmed with an increased number of participants**