Word Recognition and Frequency Selectivity in Cochlear Implant Simulation: Effect of Channel Interaction

(1)

HAL Id: hal-03163116

https://hal.archives-ouvertes.fr/hal-03163116

Submitted on 9 Mar 2021

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Word Recognition and Frequency Selectivity in Cochlear

Implant Simulation: Effect of Channel Interaction

Pierre-Antoine Cucis, Christian Berger-Vachon, Hung Thai-Van, Ruben

Hermann, Stéphane Gallego, Eric Truy

To cite this version:

(2)

Article

Word Recognition and Frequency Selectivity in Cochlear

Implant Simulation: Effect of Channel Interaction

Pierre-Antoine Cucis1,2,3,* , Christian Berger-Vachon2,4,5, Hung Thaï-Van2,6,7, Ruben Hermann1,2,3 , Stéphane Gallego2,8and Eric Truy1,2,3

Citation: Cucis, P.-A.; Berger-Vachon, C.; Thaï-Van, H.; Hermann, R.; Gallego, S.; Truy, E. Word Recognition and Frequency Selectivity in Cochlear Implant Simulation: Effect of Channel Interaction. J. Clin. Med. 2021, 10, 679. https://doi.org/10.3390/

jcm10040679

Academic Editor: Michael Setzen Received: 17 December 2020 Accepted: 5 February 2021 Published: 10 February 2021

Publisher’s Note:MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affil-iations.

Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1 _{Integrative, Multisensory, Perception, Action and Cognition Team (IMPACT), Lyon Neuroscience Research Center,} CRNL Inserm U1028, CNRS UMR5292, 69675 Bron, France; ruben.hermann@chu-lyon.fr (R.H.);

eric.truy@chu-lyon.fr (E.T.)

2 _{Claude Bernard Lyon 1 University, 69100 Villeurbanne, France; christian.berger-vachon@univ-lyon1.fr (C.B.-V.);} hthaivan@gmail.com (H.T.-V.); stephane.gallego@univ-lyon1.fr (S.G.)

3 _{ENT and Cervico-Facial Surgery Department, Edouard Herriot Hospital, Hospices Civils de Lyon,} 69003 Lyon, France

4 _{Brain Dynamics and Cognition Team (DYCOG), Lyon Neuroscience Research Center, CRNL Inserm U1028,} CNRS UMR5292, 69675 Bron, France

5 _{Biomechanics and Impact Mechanics Laboratory (LBMC), French Institute of Science and Technology for Transport,} Development and Networks (IFSTTAR), Gustave Eiffel University, 69675 Bron, France

6 _{Paris Hearing Institute, Institut Pasteur, Inserm U1120, 75015 Paris, France}

7 _{Department of Audiology and Otoneurological Evaluation, Edouard Herriot Hospital,} Hospices Civils de Lyon, 69003 Lyon, France

8 _{Neuronal Dynamics and Audition Team (DNA), Laboratory of Cognitive Neuroscience (LNSC),} CNRS UMR 7291, Aix-Marseille University, CEDEX 3, 13331 Marseille, France

* Correspondence: cucis.pa@gmail.com; Tel.: +33-472-110-0518

Abstract: In cochlear implants (CI), spread of neural excitation may produce channel interaction.

Channel interaction disturbs the spectral resolution and, among other factors, seems to impair speech recognition, especially in noise. In this study, two tests were performed with 20 adult normal-hearing (NH) subjects under different vocoded simulations. First, there was a measurement of word recognition in noise while varying the number of selected channels (4, 8, 12 or 16 maxima out of 20) and the degree of simulated channel interaction (“Low”, “Medium” and “High”). Then, there was an evaluation of spectral resolution function of the degree of simulated channel interaction, reflected by the sharpness (Q10dB) of psychophysical tuning curves (PTCs). The results showed a significant effect of the simulated channel interaction on word recognition but did not find an effect of the number of selected channels. The intelligibility decreased significantly for the highest degree of channel interaction. Similarly, the highest simulated channel interaction impaired significantly the Q10dB. Additionally, a strong intra-individual correlation between frequency selectivity and word recognition in noise was observed. Lastly, the individual changes in frequency selectivity were positively correlated with the changes in word recognition when the degree of interaction went from “Low” to “High”. To conclude, the degradation seen for the highest degree of channel interaction suggests a threshold effect on frequency selectivity and word recognition. The correlation between frequency selectivity and intelligibility in noise supports the hypothesis that PTCs Q10dB can account for word recognition in certain conditions. Moreover, the individual variations of performances observed among subjects suggest that channel interaction does not have the same effect on each individual. Finally, these results highlight the importance of taking into account subjects’ individuality and to evaluate channel interaction through the speech processor.

Keywords: vocoder simulation; normal-hearing; spread of excitation; cochlear implant; speech

in noise

(3)

J. Clin. Med. 2021, 10, 679 2 of 17

1. Introduction

Modern cochlear implants (CIs) provide unique results in the rehabilitation of severe and profound deafness [1]. Electrode arrays are currently composed of 12 to 22 electrodes depending on the manufacturer [2]. Thanks to multi-electrode technology, speech percep-tion, and quality of life of CI users have been considerably enhanced [3,4]. Nevertheless, an inherent outcome of multiplying the number of channels or electrodes is that it may lead to channel interaction. Indeed, the overlap of electrical fields stimulates a large number of nerve fibers and can create an overlap among the “neural channels”. Depending on the overlap degree, signals, information, neural integration, and neural processing can be degraded [5,6].

A broader spread of excitation increases channel interaction; therefore, it induces a spectral degradation. It is one of the factors leading to a poor spectral resolution, which impairs speech perception, especially in noise. As a consequence, some CI users can’t benefit from the full electrode array [7–10]. If channel interaction was detected/quantified in one key area or along the electrode array, it could help to establish new fitting processes and refine countermeasures like current focusing, channel deactivation, or channel pick-ing [11–13]. However, systematic clinical evaluation of channel interaction seems to be very rare and therefore it is not objectively taken into account in the fitting process of CIs. Channel interaction caused by a broad spread of excitation can be evaluated by psychophysical or electrophysiological techniques [14–16]. Psychophysical tuning curves (PTCs), have been largely used to quantify frequency selectivity, or channel interaction, in CI users [17,18].

In general, PTCs are considered to be time-consuming when they are measured with traditional forced-choice adaptative methods. To reduce the testing-time, a fast method has been developed by Sek et al. (2005, 2011) based on the Bekesy-tracking procedure and evaluated with normal-hearing (NH) and hearing-impaired subjects [19,20]. Kreft et al. (2019) adapted this procedure to CI users and reported that it was 3-times faster (around 20 min versus 60 min) but its repeatability was lower than the traditional forced-choice method [21]. Additionally, some authors suggested that other methods, like froward-masking electric compound action potentials (ECAP) or spectral ripple discrimination, for example, were quicker than PTCs but without comparing testing-times [22,23].

In recent experiments, psychophysical methods such as forward masking PTCs did not always appear to be strong predictors of speech perception outcomes in CI users [18,24]. Although, some studies reported encouraging correlations with speech perception scores. For example, Anderson et al. (2011) described a positive correlation between the inverse of the PTC bandwidth and sentence recognition. Additionally, Boëx et al. (2003) found a neg-ative correlation between the level of forward-masking between the different intracochlear electrodes and the consonant identification performance [22,25].

Nevertheless, some authors underlined the fact that the PTCs are, in general, measured by direct electrical stimulations and do not take into account the constraints introduced by the speech processor, and may not reflect the frequency selectivity in usual conditions. This could hinder the comparison with speech recognition [17,18,26].

(4)

controls. In absence of confounding factors, it enables an efficient management of the tested factors and it leads to powerful statistical analyses.

This study aims at determining if word recognition in noise is correlated with fre-quency selectivity when channel interaction is simulated with a cochlear implant simulator on NH subjects. To do so, we used a 20-channel noise-band vocoder to mimic the signal processing of a common CI and we added an algorithm that simulated spread of excitation. We measured the word recognition in noise as a function of the number of maxima and the level of simulated spread of excitation. We also assessed the spatial tuning curves sharpness function of the spread of excitation and we evaluated the statistical association between the word recognition in noise and the spatial tuning curves sharpness.

2. Materials and Methods 2.1. Subjects

A total of twenty native French speakers (10 females, 10 males) aged from 19 to 40 years old (mean = 26.3 years, SD = 6.4 years) took part in the study. They received a financial compensation for their participation. Pure tone audiometry was performed on all participants to verify that their hearing was normal (average hearing loss below 20 dB HL on each ear for 500, 1000, 2000, and 4000 Hz) following the recommendations of the International Bureau for Audiophonology [32]. Subjects reported no significant history of audiological or otological problems such as ear surgery or repeated otitis media during childhood. Moreover, they had no self-report history of neurological impairments or psychiatric disorders.

Written informed consent was obtained from the subjects before their inclusion. The study was conducted following the guidelines of the French Good Clinical Prac-tice, the Helsinki Declaration in its latest version, and the recommendations of the ICH (International Council on Health). The ethics committee CPP East-IV issued a favorable opinion on the realization of this study.

2.2. Hardware

Testing took place in a double-wall sound-treated room. All stimuli were generated by a standard PC connected to an external sound-card M-Track MkII (M-Audio, Cumberland, RI, USA) and were presented to the subjects using TDH39 headphones (Telephonics Cor-poration, Farmingdale, NY, USA). Sound levels were controlled by a MADSEN Orbiter922 clinical audiometer (GN Otometrics A/S, Traastrup, Danemark). The audiometer is used routinely in clinical practice and is calibrated yearly.

2.3. Vocoder Signal Processing

All signal processing was implemented in MATLAB (MathWorks, Natick, MA, USA). A 20-channel noise-band vocoder was used to mimic the signal processing of a Saphyr® SP sound processor (Oticon Medical, Vallauris, France). A Crystalis fitting strategy was simulated, which is a sound coding strategy commonly used in Oticon Medical devices (Oticon Medical, Vallauris, France) (Figure1).

First, the audio signals (recorded at 44.1 kHz) were down-sampled to 16.7 kHz. A high-pass pre-emphasis filter (Infinite Response) was applied to the input signal (fc = 1200 Hz). The signal was windowed by a 128-sample Hamming window (~7.7 ms) with a temporal-overlap of 75% resulting in an inter-frame interval of approximately 1.9 ms.

(5)

J. Clin. Med. 2021, 10, 679 4 of 17

J. Clin. Med. 2021, 10, x FOR PEER REVIEW 4 of 18

Figure 1. Block diagram of the CI simulation/vocoder.

First, the audio signals (recorded at 44.1 kHz) were down-sampled to 16.7 kHz. A high-pass pre-emphasis filter (Infinite Response) was applied to the input signal (fc = 1200 Hz). The signal was windowed by a 128-sample Hamming window (~7.7 ms) with a tem-poral-overlap of 75% resulting in an inter-frame interval of approximately 1.9 ms.

Fast Fourier Transform (FFT) was computed on each windowed part of the signal, resulting in a 64-bin spectrum. The first two and the last two bins were rejected; the re-maining 60 bins (FFT coefficients) represented the frequencies between 195 and 8008 Hz (130.2 Hz step). Then, they were distributed into 20 non-overlapping analysis channels (Table 1).

Table 1. Centre and cutoff frequencies of the vocoder. Number of bins (FFT coefficients) per channel.

Channel Lower Cutoff (Hz) Higher Cutoff (Hz) Center Frequency (Hz) Bin(s) Per channel Filter Bandwidth (Hz) Equivalent Rectangular Bandwidth (Hz) 20 195 326 261 1 131 53 19 326 456 391 1 130 67 18 456 586 521 1 130 81 17 586 716 651 1 130 95 16 716 846 781 1 130 109 15 846 977 912 1 131 123 14 977 1107 1042 1 130 137 13 1107 1237 1172 1 130 151 12 1237 1367 1302 1 130 165 11 1367 1497 1432 1 130 179 10 1497 1758 1628 2 261 200 9 1758 2018 1888 2 260 228 8 2018 2409 2214 3 391 264 7 2409 2799 2604 3 390 306 6 2799 3451 3125 5 652 362 5 3451 4102 3777 5 651 432 4 4102 4883 4493 6 781 510 3 4883 5794 5339 7 911 601 2 5794 6836 6315 8 1042 706 1 6836 8008 7422 9 1172 826

For each channel, a root-mean-square (RMS) was computed by using the FFT com-ponents representing the frequencies within the respective channels. In each temporal frame, the Crystalis (n-of-m) selection rules were applied: only the “n” channels with the

+ R esa m p lin g 16 .7 kH z to 44 .1 kH z FFT 1 FFT 2 FFT 64 64 spectral bins G roupi ng bi ns in c h ann el s Channel 1 Channel 2 Channel 20 C h an n el se lec ti o n (n r ms m ax + Cry sta lis ) Channel 1 Channel 2 Channel n (max 20) n-of-m En ve lo p sm oot h ing Lo w-p as s 65 H z X Narrow-Band noise Filter : 4th, 8th or 12th order X NB n Ham m in g 12 8 po in ts , 75 % ov e rla p Re sa mp ling 4 4 .1k H z to 1 6. 7k H z P re -em pha si s H igh – pas s, 12 00 H z In put S ig na l Ou tp ut S ig n al

Pre-processing Spectral analysis and channel construction Selection algorithm _{spread of excitation simulation}Sound reconstruction and

Figure 1.Block diagram of the CI simulation/vocoder.

Table 1.Centre and cutoff frequencies of the vocoder. Number of bins (FFT coefficients) per channel.

Channel Lower Cutoff (Hz) Higher Cutoff (Hz) Center Frequency (Hz) Bin(s) Per Channel Filter Bandwidth (Hz) Equivalent Rectangular Bandwidth (Hz) 20 195 326 261 1 131 53 19 326 456 391 1 130 67 18 456 586 521 1 130 81 17 586 716 651 1 130 95 16 716 846 781 1 130 109 15 846 977 912 1 131 123 14 977 1107 1042 1 130 137 13 1107 1237 1172 1 130 151 12 1237 1367 1302 1 130 165 11 1367 1497 1432 1 130 179 10 1497 1758 1628 2 261 200 9 1758 2018 1888 2 260 228 8 2018 2409 2214 3 391 264 7 2409 2799 2604 3 390 306 6 2799 3451 3125 5 652 362 5 3451 4102 3777 5 651 432 4 4102 4883 4493 6 781 510 3 4883 5794 5339 7 911 601 2 5794 6836 6315 8 1042 706 1 6836 8008 7422 9 1172 826

For each channel, a root-mean-square (RMS) was computed by using the FFT compo-nents representing the frequencies within the respective channels. In each temporal frame, the Crystalis (n-of-m) selection rules were applied: only the “n” channels with the highest RMS amplitude were kept (the others were set to zero). The remaining “n” channels were then compared to the highest RMS and the channels with an RMS-amplitude lower than RMSmax minus 45 dB were set to zero.

(6)

The resulting modulated narrowband noises were summed and the output signal energy was leveled to the input signal energy and if necessary, normalized between−1 and 1 to avoid peak-clipping. When the process was complete, the signal was resampled to 44.1 kHz and stored in a “wav” file.

2.4. Speech Audiometry in Noise

The speech material was French dissyllabic words uttered by a male speaker. The words were extracted from Fournier’s lists [36]. Speech and noise were summed at the required signal to noise ratio (SNR) before being processed by the vocoder. The noise used here was a cocktail-party noise (a mixture of chatter and tableware noises).

Sounds (words + noise) were presented to the subjects’ right ear using headphones connected to a clinical audiometer that calibrated the sound level at 65 dB SPL. Subjects were instructed to repeat each word after it was presented to them. A word-list incorporates 10 dissyllabic words and the error unit was the syllable that led to final scores between 0 and 20, then converted to percentage.

There was a short training session before the actual test to accustom the subject to the vocoded sounds and to be sure that he or she understood the instructions. Training words were extracted from the first list of Lafon’s dissyllabic words [37] and they were presented to the subject after being processed by the vocoder with the following parameters: +18 dB SNR, 16 maxima (16 out of 20), and “Low” spread (72 dB/octave filter slope). This training session was not part of the experiment.

For the actual testing session, a combination of three conditions was attributed ran-domly to a Fournier’s word list:

SNR:−3, 3, and 9 dB (mixed before vocoding, as it is the case with CIs) Number of maxima: 4, 8, 12, and 16 (out of 20)

Spread of excitation: “Low” (−72 dB/octave filter slope), “Medium” (−48 dB/oct) and “High” (−24 dB/oct).

The combination led to 36 different conditions (so 36 lists were presented to each subject).

2.5. Psychophysical Tuning Curves 2.5.1. Stimuli

We chose the stimuli to be able to reproduce this experiment with CI users with a Digisonic SP cochlear implant. We determined the frequencies for which only one electrode was activated. (see Table2).

Pure tones that activated only one electrode were recorded by sweeping the frequen-cies from 190 up to 8000 Hz with a 1 Hz step. Sine-waves were sent to a Saphyr® SP sound processor via an auxiliary cable and we recorded the activated electrodes using a Digispy interface provided by Oticon Medical. Three-second sine-waves were generated by a MATLAB script and the PC sound card was set on 100% volume. Levels were adjusted at 50% of the stimulation dynamic using the volume wheel on the auxiliary cable. The sound processor settings are indicated in Table3.

Then, to establish the PTCs, the sounds were presented to the right ear. A MATLAB script generated the stimuli and the sound levels were adapted according to the answers given by the subject (see Section2.5.2).

The probe was set to fp = 2226 Hz which is the center “frequencies for single-channel activation” of the 8th channel. Maskers matched with channels 11 to 5, fm = 1440.5, 1637, 1898.5, 2226, 2619, 3143 and 3798 Hz.

The 110-ms masker was followed by the 20-ms probe with no delay. Both stimuli were gated with 4-ms raised-cosine-squared ramps before entering the vocoder and gated again after signal processing to ensure no temporal artifacts.

(7)

J. Clin. Med. 2021, 10, 679 6 of 17

presenting pure-tones at the input of the vocoder. This is equivalent to measuring PTCs with narrowband noises that have different slopes.

Table 2.Frequencies for single electrode activation.

Channel Lowest Activation-Frequency (Hz) Highest Activation-Frequency (Hz) Center Frequency (Hz) 20 195 265 230 19 390 396 393 18 521 527 524 17 652 658 655 16 783 789 786 15 914 920 917 14 1045 1051 1048 13 1176 1182 1179 12 1307 1312 1310 11 1438 1443 1441 10 1569 1705 1637 9 1830 1967 1899 8 2092 2360 2226 7 2485 2753 2619 6 2878 3408 3143 5 3533 4063 3798 4 4188 4848 4518 3 4973 5765 5369 2 5890 6813 6352 1 6938 8115 7527

Table 3.Sound processor settings while measuring the “activation bandwidths”.

Parameter Setting Min. Stim 9 ns Max. Stim 52 ns Strategy Crystalis XDP Stimulation 500 Hz Maxima 16

Compression Linear (personalized)

Dynamic range 26–105 dB SPL

Audio input Auxiliary only (0 dB Gain)

2.5.2. Procedure

A three-interval-forced-choice (3IFC) [38], two-up one-down forward masking paradigm was used to determine the masked thresholds. The volume of the masker was increased when the subject correctly identified the position of the “masker–probe” sequence twice in a row. The volume of the masker was decreased after one wrong answer. Each PTC took approximately one hour to complete. A break was proposed between each session.

For each listener, a hearing threshold and a maximum acceptable level were measured for the maskers and the probe. This was repeated before each PTC run as the stimuli changed with the simulated spread of excitation.

A short training period was performed before the actual test to be sure that the subject understood the instructions and could hear and identify the probe frequency at the beginning of each run.

(8)

The level of the probe frequency was fixed at 20% of the dynamic range. Starting at a level of 10 dB SL, the masker sound level was adaptively changed with a 4 dB step for the first three reversals, decreased to 2 dB for reversals three to six, and 1 dB for the last six. There were 12 reversals inside a run and the masked threshold was defined in dB SPL as the average masker level at the last six reversals.

2.6. Tuning Curves Fitting and Q10dB

Each PTC was fitted with two quadratic functions, one on the low-side and one on the high-side around the probe frequency (R2: Mean = 0.980; SD = 0.037; min = 0.778; max = 1.00). Slopes on both sides were considered monotonic, so if a masked threshold did not follow this rule with a deviation higher than 10 dB, it was not taken into account for the regression. Following this rule, the typical fitted-function included all the seven masking thresholds except for three subjects: subject S04 (6 points for the “Low” spread curve and 5 points for the “Medium” curve), S07 (6 points, “Medium” spread) and S19 (6 points, “Low” spread). Moreover, S16 did not manage to perform the test and the PTCs could not be established so, the results were analyzed for 19 subjects (out of 20). From the fitted PTCs, we then characterized channel interaction using the Q10dB as a sharpness factor. Q10dB was calculated by dividing the probe frequency by the PTC bandwidth (BW) at 10 dB above the tip level (Q10dB = 2226/BW10dB).

2.7. Statistical Analyses

Before analysis, word recognition scores were transformed into rationalized arcsine units (RAUs) with a correction for the small number of items [39]. Converting word recognition proportion scores to RAUs allows more appropriate statistical analyses and attempts to minimize floor and ceiling effects [40].

Word recognition scores were evaluated by a repeated-measures ANOVA using linear mixed models with three factors of interest: SNR (9 dB, 3 dB, and−3 dB SNR), number of maxima (4, 8, 12, and 16 out of 20), and level of spread of excitation (“Low”: −72 dB/octave, “Medium”:−48 dB/octave, “High”:−24 dB/octave), and finally subject as the random factor. Then, 2-by-2 comparisons were made with bilateral paired-samples t-tests, and significance levels were adjusted according to the Bonferroni correction.

For Q10dB, a repeated-measures ANOVA, using linear mixed models, was performed to determine if there were significant differences in Q10dB between the levels of spread of excitation. Then, 2-by-2 comparisons were made with paired-samples t-tests. Significance levels were adjusted according to the Bonferroni correction.

A linear correlation was measured between mean intelligibility scores (in RAUs, cal-culated across the SNRs) and Q10dB (3 points per subjects, one for each level of spread of excitation). To account for repeated measurements within the same subjects, we performed repeated-measures correlations using the rmcorr package in R. Indeed, this technique takes into account the non-independence between the measurements and uses an anal-ysis of covariance to consider inter-individual variability. Therefore, rmcorr calculates parallel regression lines (same slopes, varying intercepts) to fit each participant in the best possible way [41].

(9)

J. Clin. Med. 2021, 10, 679 8 of 17

3. Results

3.1. Speech Audiometry in Noise

Word recognition scores are displayed as percentages for more clarity. However, statistical analyses were performed on RAUs scores as described above. Scores in RAUs ranged from−12.78 to 112.78 RAUs, corresponding to recognition scores of 0% to 100%, respectively. Figure2shows an overview of the data. The results are split into three graphs, one for each SNR, and organized to ease visualization and interpretation.

Before analysis, word recognition scores were transformed into rationalized arcsine units (RAUs) with a correction for the small number of items [39]. Converting word recog-nition proportion scores to RAUs allows more appropriate statistical analyses and at-tempts to minimize floor and ceiling effects [40].

Word recognition scores were evaluated by a repeated-measures ANOVA using lin-ear mixed models with three factors of interest: SNR (9 dB, 3 dB, and −3 dB SNR), number of maxima (4, 8, 12, and 16 out of 20), and level of spread of excitation (“Low”: −72 dB/oc-tave, “Medium”: −48 dB/ocdB/oc-tave, “High”: −24 dB/octave), and finally subject as the random factor. Then, 2-by-2 comparisons were made with bilateral paired-samples t-tests, and sig-nificance levels were adjusted according to the Bonferroni correction.

For Q10dB, a repeated-measures ANOVA, using linear mixed models, was per-formed to determine if there were significant differences in Q10dB between the levels of spread of excitation. Then, 2-by-2 comparisons were made with paired-samples t-tests. Significance levels were adjusted according to the Bonferroni correction.

A linear correlation was measured between mean intelligibility scores (in RAUs, cal-culated across the SNRs) and Q10dB (3 points per subjects, one for each level of spread of excitation). To account for repeated measurements within the same subjects, we per-formed repeated-measures correlations using the rmcorr package in R. Indeed, this tech-nique takes into account the non-independence between the measurements and uses an analysis of covariance to consider inter-individual variability. Therefore, rmcorr calculates parallel regression lines (same slopes, varying intercepts) to fit each participant in the best possible way [41].

Finally, for each subject, we compared the evolution of the average word recognition in noise function of the evolution of Q10dB by calculating the difference between the scores for the “Low” and the “High” spread of excitation. A Spearman correlation was performed between those variables.

3. Results

3.1. Speech Audiometry in Noise

Word recognition scores are displayed as percentages for more clarity. However, sta-tistical analyses were performed on RAUs scores as described above. Scores in RAUs ranged from −12.78 to 112.78 RAUs, corresponding to recognition scores of 0% to 100%, respectively. Figure 2 shows an overview of the data. The results are split into three graphs, one for each SNR, and organized to ease visualization and interpretation.

11.9 12.8 7.0 0 5 10 15 20 0 10 20 30 40 50 60 70 80 90 100

Low Medium High

Pe rc ent (%) c o rr ec t SNR −3 dB

4-of-20 8-of-20 12-of-20 16-of-20 Mean

A

Figure 2. Average syllable recognition scores (in percent correct) function of the spread of excitation and number of maxima. Each number of maxima is represented by a different grey bar, and average performances for a single level of spread of excitation (across the number of maxima) are represented by a grey line with diamonds. The left scale is for the bars and scale on the right, is for the averages across maxima. (A) Condition −3 dB of sig-nal-to-noise ratio, (B) Condition 3 dB SNR, and (C) Condition 9 dB SNR. Error bars repre-sent ±1 standard error of the mean.

The repeated measure ANOVA (mixed models) revealed a significant main effect of: • The Spread of excitation: F2, 677 = 23.80, p < 0.0001,

• The SNR: F2, 677 = 999.32, p < 0.0001,

• No effect of the Number of Maxima: F3, 677 = 0.60, p = 0.61.

Additionally, the two-way interactions between the factors were not significant: • Spread of excitation × Number of Maxima: F6, 677 = 0.75, p = 0.61,

• Spread of excitation × SNR: F6, 677 = 0.18, p = 0.95, • Number of Maxima × SNR: F6, 677 = 0.75, p = 0.61.

Average recognition scores for each factor (in percent correct) are presented in Figure 3 and Table 4. We can see that the average scores across the number of maxima remained around 50%. Average scores across SNRs ranged from around 11% at −3 dB SNR to 83% at 9 dB SNR. Furthermore, all three 2-by-2 comparisons were significant (t-tests, p < 0.0001). Then, the 2-by-2 comparisons revealed a significant decrease in the average scores for the “High” level of spread of excitation compared to the “Low” and “Medium” levels (from around 50% to 43%). T-tests revealed that the average score for a “High” level of

54.1 55.5 45.1 0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80 90 100

Low Medium High

P erce n t (%) co rre ct Spread of excitation SNR 3 dB B 55.5 45.1 54.1 85.8 86.5 77.8 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100

Low Medium High

Pe rc e n t (% ) co rr ec t

Level of excitation spread

SNR 9 dB

C

Figure 2.Average syllable recognition scores (in percent correct) function of the spread of excitation

(10)

The repeated measure ANOVA (mixed models) revealed a significant main effect of:

• The Spread of excitation: F2, 677 = 23.80, p < 0.0001,

• The SNR: F2, 677 = 999.32, p < 0.0001,

• No effect of the Number of Maxima: F3, 677 = 0.60, p = 0.61.

Additionally, the two-way interactions between the factors were not significant:

• Spread of excitation×Number of Maxima: F6, 677 = 0.75, p = 0.61,

• Spread of excitation×SNR: F6, 677 = 0.18, p = 0.95,

• Number of Maxima×SNR: F6, 677 = 0.75, p = 0.61.

Average recognition scores for each factor (in percent correct) are presented in Figure3

and Table4. We can see that the average scores across the number of maxima remained around 50%. Average scores across SNRs ranged from around 11% at−3 dB SNR to 83% at 9 dB SNR. Furthermore, all three 2-by-2 comparisons were significant (t-tests, p < 0.0001). Then, the 2-by-2 comparisons revealed a significant decrease in the average scores for the “High” level of spread of excitation compared to the “Low” and “Medium” levels (from around 50% to 43%). T-tests revealed that the average score for a “High” level of spread of excitation was different from the two others (“Low” vs. “High”: p < 0.0001; “Medium” vs. “High”: p < 0.0001; “Low” vs. “Medium”: p = 0.43).

spread of excitation was different from the two others (“Low” vs. “High”: p < 0.0001; “Me-dium” vs. “High”: p < 0.0001; “Low” vs. “Me“Me-dium”: p = 0.43).

Figure 3. Results of the 3-way repeated measures ANOVA and results of the 2-by-2 comparisons (Student’s t-tests) made on scores transformed in rationalized units (RAU). (A) Average syllable recognition scores (in percent correct) across the signal-to-noise ratios. (B) Average across the number of maxima. (C) Average across the levels of spread of excitation. Error bars represent ±1 standard error of the mean.

Table 4. Average syllable recognition scores for each factor and variation.

Factor Variation Unit Mean Standard Deviation

SNR SNR-3 % 10.6 10.8 rau 6.5 16.1 SNR3 % 51.5 18.6 rau 51.3 17.9 SNR9 % 83.4 12.4 rau 84.9 16.2 Number of Maxima 4-of-20 % 47.7 33.4 rau 47.1 36.7 8-of-20 % 48.8 33.7 rau 47.9 36.9 12-of-20 % 49.0 32.5 rau 48.1 35.2 0 10 20 30 40 50 60 70 80 90 100 SNR-3 SNR3 SNR9 P e rc en t (% ) co rr ect Signal-to-noise ratio (dB) SNR ANOVA : p < 0.0001 p < 0.0001 (t-test) p < 0.0001 p < 0.0001 A 0 10 20 30 40 50 60 70 80 90 100 4 8 12 16 Percen t (% ) correct

Number of maxima (n-of-20)

N-of-M ANOVA : p = 0.61 B 0 10 20 30 40 50 60 70 80 90 100

Low Medium High

Pe rc en t (%) co rr ect Spread of excitation

Spread of Excitation

C p < 0.0001 (t-test) p < 0.0001 p = 0.43 (NS)

Figure 3. Results of the 3-way repeated measures ANOVA and results of the 2-by-2 comparisons

(11)

J. Clin. Med. 2021, 10, 679 10 of 17

Table 4.Average syllable recognition scores for each factor and variation.

Factor Variation Unit Mean Standard Deviation

SNR SNR-3 _rau% 10.6_6.5 10.8_16.1 SNR3 _rau% 51.5_51.3 18.6_17.9 SNR9 _rau% 83.4_84.9 12.4_16.2 Number of Maxima 4-of-20 _rau% 47.7_47.1 33.4_36.7 8-of-20 _rau% 48.8_47.9 33.7_36.9 12-of-20 _rau% 49.0_48.1 32.5_35.2 16-of-20 _rau% 48.4_47.1 32.6_36.1 Spread of excitation Low _rau% 50.6_50.1 33.3_36.2 Medium _rau% 51.5_51.1 33.0_36.4 High % 43.3 32.2 rau 41.4 35.4

3.2. Psychophysical Tuning Curves

Individual and average PTCs are displayed in Figure4(masking threshold in dB SPL function of masker frequency in Hz). Table5gives an overview of the results. First, looking at the shape of the PTCs, we can see that there is a noticeable heterogeneity between subjects. Furthermore, this heterogeneity seems to be larger on the high-frequency side of the curves, with standard deviations of approximately 18, 12, and 10 dB SPL versus approximately 7, 9, and 8 dB SPL on the low side. Finally, the “High” level of spread of excitation seems to flatten the curve while the “Low” and “Medium” levels give very similar shapes.

Table 5.Average masking thresholds (in dB SPL) function of the simulated spread of excitation.

Frequency (Hz)/Masking Threshold (dB SPL) 1441 1637 1898.5 2226 2619 3143 3798

(12)

J. Clin. Med. 2021, 10, 679 11 of 17 16-of-20 % 48.4 32.6 rau 47.1 36.1 Spread of excitation Low % 50.6 33.3 rau 50.1 36.2 Medium % 51.5 33.0 rau 51.1 36.4 High % 43.3 32.2 rau 41.4 35.4

3.2. Psychophysical Tuning Curves

Individual and average PTCs are displayed in Figure 4 (masking threshold in dB SPL function of masker frequency in Hz). Table 5 gives an overview of the results. First, look-ing at the shape of the PTCs, we can see that there is a noticeable heterogeneity between subjects. Furthermore, this heterogeneity seems to be larger on the high-frequency side of the curves, with standard deviations of approximately 18, 12, and 10 dB SPL versus ap-proximately 7, 9, and 8 dB SPL on the low side. Finally, the “High” level of spread of excitation seems to flatten the curve while the “Low” and “Medium” levels give very sim-ilar shapes.

Figure 4. Measured psychophysical tuning curves (PTC). Masking thresholds (in dB SPL) function of the masker frequency

(fm = 1440.5, 1637, 1898.5, 2226, 2619, 3143 and 3798 Hz). Each subject is represented by a different grey curve and the average curve is in black with white dots. (A) PTCs for “Low” spread of excitation. (B) PTCs for “Medium” spread. (C) PTCs for “High” spread.

10 20 30 40 50 60 70 80 90 100 1000 1500 2000 2500 3000 3500 4000 M as ker le vel ( d B SP L) Masker frequency (Hz)

Low Spread of Excitation

Subjects Mean A 10 20 30 40 50 60 70 80 90 100 1000 1500 2000 2500 3000 3500 4000 M asker level (d B S P L) Masker frequency (Hz) Medium B 10 20 30 40 50 60 70 80 90 100 1000 1500 2000 2500 3000 3500 4000 Mas ker lev el ( dB SPL ) Masker frequency (Hz)

High

C

Figure 4.Measured psychophysical tuning curves (PTC). Masking thresholds (in dB SPL) function of the masker frequency

(fm = 1440.5, 1637, 1898.5, 2226, 2619, 3143 and 3798 Hz). Each subject is represented by a different grey curve and the average curve is in black with white dots. (A) PTCs for “Low” spread of excitation. (B) PTCs for “Medium” spread. (C) PTCs for “High” spread.

These changes in shape had an impact on the Q10dB values as we can see in Figure5. The average Q10dB for the “Low” spread was approximately 8, for the “Medium” level around 7, and a noticeable decrease for the “High” level with a Q10dB of 3.

Table 5. Average masking thresholds (in dB SPL) function of the simulated spread of excitation.

Frequency (Hz)/Masking Threshold (dB SPL) 1441 1637 1898.5 2226 2619 3143 3798 Low spread Mean 64.4 58.4 46.5 20.3 44.6 67.0 77.2 Standard error 6.9 9.6 8.8 4.3 13.9 13.9 18.2 Min 52.1 36.8 16.6 12.8 23.8 45.2 12.3 Max 79.0 73.1 59.8 28.2 77.0 84.2 88.1 Medium Mean 66.2 58.4 45.9 23.1 45.7 67.3 80.8 Standard error 9.2 13.9 7.9 7.3 13.8 12.6 12.1 Min 41.5 11.2 22.8 12.4 23.4 46.0 43.1 Max 82.5 77.7 65.6 36.7 70.2 86.3 90.0 High Mean 61.8 48.4 38.4 22.6 35.4 45.7 59.3 Standard error 7.5 9.9 5.8 5.5 11.8 10.9 10.4 Min 48.2 19.8 24.4 13.8 13.9 24.3 42.7 Max 76.1 63.6 48.1 35.3 60.8 62.2 74.8

These changes in shape had an impact on the Q10dB values as we can see in Figure 5. The average Q10dB for the “Low” spread was approximately 8, for the “Medium” level around 7, and a noticeable decrease for the “High” level with a Q10dB of 3.

The repeated measure ANOVA (mixed models) revealed a significant main effect of the level of spread of excitation (F2, 36 = 38.49, p < 0.0001). The 2-by-2 tests showed a sig-nificant difference between the Q10dB at “High” level of spread of excitation and the two others(“Low” vs. “High”: p < 0.0001; “Medium” vs. “High”: p < 0.0001; “Low” vs. “Me-dium”: p = 0.16).

Figure 5. Comparison of the sharpness (Q10dB) of the average psychophysical tuning curves (PTC) function of the level

of spread of excitation. Results of the repeated measure ANOVA and results of the 2-by-2 comparisons (Student’s t-tests). (A) Boxplots showing Q10dB: the horizontal line within the box indicates the median; means are indicated by a plus sign; edges are the 25th and 75th percentiles, whiskers the most extreme data points. Each dot represents one subject. (B) Aver-age tuning curves for the three levels of spread of excitation.

3.3. Correlation between Word Recognition and PTC Sharpness

We assessed the relationship between the PTC sharpness (Q10dB) and the word recognition (in RAUs) (Figure 6). First, repeated measures correlations indicated a strong

Low Medium High

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Q1 0dB PTC Q10dB p < 0.0001 (t-test) p < 0.0001 p = 0.16 (NS) ANOVA : p < 0.0001 A

Figure 5.Comparison of the sharpness (Q10dB) of the average psychophysical tuning curves (PTC) function of the level

(13)

J. Clin. Med. 2021, 10, 679 12 of 17

The repeated measure ANOVA (mixed models) revealed a significant main effect of the level of spread of excitation (F2, 36 = 38.49, p < 0.0001). The 2-by-2 tests showed a significant difference between the Q10dB at “High” level of spread of excitation and the two others (“Low” vs. “High”: p < 0.0001; “Medium” vs. “High”: p < 0.0001; “Low” vs. “Medium”: p = 0.16).

3.3. Correlation between Word Recognition and PTC Sharpness

We assessed the relationship between the PTC sharpness (Q10dB) and the word recognition (in RAUs) (Figure6). First, repeated measures correlations indicated a strong positive association (rmcorr = 0.72, p < 0.001). Then, we compared the evolution of the performances between the “Low” and “High” level of simulated spread of excitation: the difference of word recognition and the difference of Q10dB were positively correlated (average values measured across all the SNRs) (rspearman= 0.55, p = 0.017).

positive association (rmcorr = 0.72, p < 0.001). Then, we compared the evolution of the per-formances between the “Low” and “High” level of simulated spread of excitation: the difference of word recognition and the difference of Q10dB were positively correlated (average values measured across all the SNRs) (rspearman = 0.55, p = 0.017).

Figure 6. Average syllable recognition scores (in rau) function of Q10dB (A) Repeated measures correlation (rmcorr). Measures from the same participant are given the same color, with corresponding lines to show the rmcorr. There are 3 dots per participants, one for each level of spread of excitation. (B) Average syllable recognition difference function of Q10dB difference between “Low” and “High” spread of excitation simulation. Each dot represents one subject.

4. Discussion

In this study, we investigated the impact of simulated channel interaction on the word recognition in noise and on the frequency selectivity of 20 NH subjects. We used a 20-channel vocoder with a simulation of spread of excitation.

Word recognition in cocktail-party noise was evaluated on disyllabic words by var-ying: the SNR (−3, 3 and 9 dB SNR in front of the vocoder); the number of selected maxima (4, 8, 12, and 16 out of 20) and the spread of excitation (synthesis filter slopes of −24, −48 and −72 dB/octave). Frequency selectivity was characterized by the Q10dB of forward-masked PTCs, measured with sounds processed by the vocoder and the simulation of the spread of excitation.

This experiment is a simulation and should be completed by results from a trans-posed experiment with CI users. The NH subjects where relatively young and some age categories are underrepresented (between 30 and 50 years old), which could influence de results. The choice of the Fournier’s dissyllabic words appeared to be adequate. Never-theless, it should be noted that the recognition of Fournier’s words can be influenced by the lexical knowledge of the subject.

The main finding of this study is that, within individuals, simulated channel interac-tion, correlates with PTCs selectivity (Q10dB) and correlates with speech recognition in noise.

The motivation to choose a cocktail-party noise for speech audiometry in noise was its complexity and that it recreates conditions of realistic listening environments, such as schools, restaurants, and other social gatherings.Cocktail-party noise is a broadband fluc-tuating noise and it is very close to the long-term speech spectrum. Therefore, it induces interferences due to amplitude modulation and informational masking which play an im-portant role in CI users’ speech recognition. The similarities between the target and the

2 4 6 8 10 12 30 35 40 45 50 55 60 65 PTC Q10dB S yll a bl e re cogn iti o n sco re ( ra u ) rmcorr = 0.72 p < 0.001

A

rmcorr = 0.72 p < 0.001 0 2 4 6 8 10 0 5 10 15 PTC Q10dB difference Low-High S yll a bl e re cogn iti on di ffe renc e L ow -Hi g h ( ra u ) r (spearman) = 0.546 p = 0.017

B

r (spearman) = 0.546 p = 0.017

Figure 6. Average syllable recognition scores (in rau) function of Q10dB (A) Repeated measures correlation (rmcorr).

Measures from the same participant are given the same color, with corresponding lines to show the rmcorr. There are 3 dots per participants, one for each level of spread of excitation. (B) Average syllable recognition difference function of Q10dB difference between “Low” and “High” spread of excitation simulation. Each dot represents one subject.

4. Discussion

In this study, we investigated the impact of simulated channel interaction on the word recognition in noise and on the frequency selectivity of 20 NH subjects. We used a 20-channel vocoder with a simulation of spread of excitation.

Word recognition in cocktail-party noise was evaluated on disyllabic words by varying: the SNR (−3, 3 and 9 dB SNR in front of the vocoder); the number of selected maxima (4, 8, 12, and 16 out of 20) and the spread of excitation (synthesis filter slopes of−24,−48 and

−72 dB/octave). Frequency selectivity was characterized by the Q10dB of forward-masked PTCs, measured with sounds processed by the vocoder and the simulation of the spread of excitation.

This experiment is a simulation and should be completed by results from a transposed experiment with CI users. The NH subjects where relatively young and some age categories are underrepresented (between 30 and 50 years old), which could influence de results. The choice of the Fournier’s dissyllabic words appeared to be adequate. Nevertheless, it should be noted that the recognition of Fournier’s words can be influenced by the lexical knowledge of the subject.

(14)

The motivation to choose a cocktail-party noise for speech audiometry in noise was its complexity and that it recreates conditions of realistic listening environments, such as schools, restaurants, and other social gatherings. Cocktail-party noise is a broadband fluctuating noise and it is very close to the long-term speech spectrum. Therefore, it induces interferences due to amplitude modulation and informational masking which play an important role in CI users’ speech recognition. The similarities between the target and the masker, exacerbated by an impaired spectral resolution, increase attentional resources needed to differentiate the speech signal and the cocktail-party noise [42–44].

It was observed a speech recognition plateau with around four channels in quiet [45,46]. Around 10 channels are necessary to reach a plateau under more adverse conditions (e.g., noise, difficult speech material, etc.) [7,27]. Recently, it was observed that intelligi-bility could continue to improve with a larger number of channels [47,48]. In vocoder studies, even if a plateau was also observed, the listening effort seems to be reduced when more channels are used [49,50]. In our experiment, word recognition in noise was not significantly changed by varying the number of selected maxima. This result may be due to the characteristics of our vocoder. The smoothing performed by the “overlap-and-add” reconstruction and the 65 Hz low pass filtering partially filled the temporal gaps created by the channel picking. These steps do not entirely suppress the effect of channel-picking but they may have an impact on the final results. In these conditions, subjects would already have reached a plateau of performance and, in our experiment, increasing the number of selected channels beyond four would make no difference.

In general, channel-picking vocoders use smoothing filters that follow the analysis rate. For example, Dorman et al. (2002) used a 400 Hz cutoff frequency and found a significant effect of the number of maxima on speech recognition. In quiet, performances reached a plateau for 6-of-20 maxima. In noise, 9-of-20 maxima were needed. In this study, the effect of channel interaction was not investigated. There was a constant overlap between the analysis channels in the experiment [51] (sixth order Butterworth filters,

−36 dB/oct). Then, Bingabr et al. (2008) investigated the effect of varying the degree of simulated channel interaction with a fixed-channel vocoder (4, 8 and 16 channels). The data showed a significant interaction between the number of channel and spread of excitation. They concluded that recognition of sentences in noise is likely to be improved by reducing channel interaction and by improving the spectral resolution to 8–16 channels [52]. Therefore, simulated channel interaction may not suppress the effect of changing the amount of spectral information on speech recognition. On the contrary, Verschurr et al. (2009) investigated the effect of 4-of-7 and 12-of-20 strategies with simulated channel interaction and found no substantial changes in consonant recognition performances [13]. They also stated that reduced consonant recognition in better performing cochlear implant users was mainly due to cochlear implant processing and not to channel interaction.

These studies give some clues but it should be noted that varying the total number of channels is not directly comparable to varying the number of maxima with a channel-picking coding strategy. Channel-channel-picking strategies modulate the relative importance of each channel so it has an effect on spectral contrast and spectral resolution, depending on the proportion of selected channels. Several investigations found a significant decrease in speech-in-noise recognition and supported the idea that channel interaction can affect speech perception outcomes [53,54].

(15)

J. Clin. Med. 2021, 10, 679 14 of 17

−60 dB/octave conditions [34]. This threshold effect may be due to the smaller difference of slope between−48 and−72 dB/octave filters than between−24 and−48 dB/octave filters.

This confirms that poor spectral resolution due to channel interaction negatively impacts speech perceptions in quiet and in noisy conditions.

In our study, frequency selectivity (reflected by the Q10dB) was significantly decreased by spatial spread and results showed, like for speech perception, a threshold effect. Again, the spread of excitation at the highest level (−24 dB/octave) was different from the two other situations (−48 and−72 dB/octave).

A small number of studies have investigated the effect of simulated spatial spread of excitation on the shape of PTCs. One of them, Langner et al. (2016) showed an improve-ment of the PTCs Q10dB by using a dynamic compression algorithm to restore frequency selectivity in CI users and NH subjects listening to a vocoder [31]. This kind of simulation seems equivalent to measuring PTCs and varying the frequency range of the maskers.

Concerning the makers’ frequency range, Kluk and Moore (2004) have measured Q10dB with NH subjects using simultaneous masking for 1 and 4 kHz probe frequencies (pure tones). They tested noise-maskers of 80, 160, and 320 Hz wide and found a decrease in Q10dB when increasing the noise-maskers’ bandwidth. They suggested that only a part of the masking noise passed through the auditory filter. Indeed, in their experiment, for the two wider noise-maskers (160 and 320 Hz), the bandwidths were greater than the equivalent rectangular bandwidth (ERB) at the reference frequency [56].

Using wide maskers with different power decay seems to have the same effect. In our case, the probe was a narrow-band noise with a center frequency of 2214 Hz and the ERB for this frequency is approximately 264 Hz. The probe sound and the “high-frequency” maskers were wider than 264 Hz (Table1). It may explain the broadening of the PTCs on the “high-frequency” side.

Because PTC measures have been associated with small correlations with speech perception of CI users [18,22,26], some methods like spectral ripple discrimination tests have been used [35,57]. In our experiment, the result of the repeated measure correlation showed a strong within-subject correlation between Q10dB and word recognition in noise when changing the simulated spread of excitation (Figure5). It means that for each subject there is a strong correlation between word recognition and frequency selectivity reflected by the Q10dB parameter. The within subject correlation obtained with a vocoder could also suggest that measuring PTCs through the speech processor is worthwhile to be explored because processors induce constraints not taken into account in experiments measuring PTCs by direct electrical stimulations. Q10dB measured through the speech processor would be closer to channel interaction experienced by each CI users. The same protocol as the one presented here could be transposed to CI users.

Finally, our subjects were considered as NH people but the results showed a vari-able resilience between individuals. The improvement of word recognition in noise was correlated with the improvement of Q10dB in NH listeners using a vocoder CI simulator (Figure6). As the simulated channel interaction was the same for all, the effects were het-erogeneous between subjects. It seems that people are not equal facing spread of excitation and for CI users it may be the same case. Despite the documented variability of speech in noise performances among people with normal-hearing, this experiment shows that there is also a variable resistance to spectral smearing. In some cases, when spectral smearing was increased the speech recognition and the Q10dB barely changed.

5. Conclusions

(16)

the evolution of the frequency selectivity and the average speech recognition in noise across the “Low” and the “High” degree of simulated spread of excitation. Then, between the two lightest simulated channel interaction frequency selectivity, word recognition in noise did not significantly change while the strongest interaction impaired significantly the scores. This result shows a threshold effect, the channel interaction has to be sufficiently wide to induce a modification of performances. Additionally, as the vocoder mimics a CI speech processor, these findings highlight the importance of measuring PTC through the speech processor to take into account the signal processing. Finally, it seems useful to investigate with CI users the individual impact of channel interaction on PTCs frequency selectivity and speech recognition in noise. As vocoder simulations are good predictors of CI users’ performances, we could expect a correlation between frequency selectivity and word recognition within individual CI users.

Author Contributions:Conceptualization, P.-A.C., C.B.-V., S.G. and E.T.; methodology, P.-A.C. and

S.G.; software, P.-A.C.; validation, P.-A.C., R.H. and S.G.; formal analysis, P.-A.C.; investigation, P.-A.C.; resources, H.T.-V. and E.T.; data curation, P.-A.C.; writing—original draft preparation, P.-A.C. and C.B.-V.; writing—review and editing, P.-A.C., C.B.-V., H.T.-V., R.H., S.G. and E.T.; visualiza-tion, P.-A.C.; supervision, C.B.-V., S.G. and E.T.; project administravisualiza-tion, P.-A.C., H.T.-V. and E.T.; funding acquisition, P.-A.C., and E.T. All authors have read and agreed to the published version of the manuscript.

Funding:This work was supported by a research partnership between Neurelec/Oticon Medical,

Vallauris, France, and the Civils Hospitals of Lyon, Lyon, France. The subjects received financial compensation for their participation from Lyon Neuroscience Research Center, INSERM U1028, CNRS U5292.

Institutional Review Board Statement:The study was conducted according to the guidelines of the

Declaration of Helsinki, and approved by the Ethics Committee CPP EST IV, 67091 Strasbourg Cedex, France (ID-RCB: 2019-A00088-49, date of approval 12/02/2019).

Informed Consent Statement:Informed consent was obtained from all subjects involved in the study.

Data Availability Statement: The data presented in this study are available on request from the

corresponding author. The data are not publicly available due to privacy protection.

Acknowledgments: The authors would like to thank the people and institutions that allowed to

carry out this work: the subjects who participated in the experiments, Charles Alexandre Joly and the entire staff of the ORL and the Audiology Department at Edouard Herriot University Hospital, Civil Hospitals of Lyon. We also thank Dan Gnansia, Pierre Stahl, and the company Neurelec/Oticon Medical, for providing the Digispy interface and for the assistance with developing the vocoder.

Conflicts of Interest:The authors declare no conflict of interest. The funders had no role in the design

of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

1. Clark, G. Cochlear Implants: Fundamentals and Applications; Springer Science & Business Media: New York, NY, USA, 2006; ISBN 978-0-387-21550-1.

2. Dhanasingh, A.; Jolly, C. An Overview of Cochlear Implant Electrode Array Designs. Hear. Res. 2017, 356, 93–103. [CrossRef]

3. McRackan, T.R.; Bauschard, M.; Hatch, J.L.; Franko-Tobin, E.; Droghini, H.R.; Nguyen, S.A.; Dubno, J.R. Meta-Analysis of

Quality-of-Life Improvement after Cochlear Implantation and Associations with Speech Recognition Abilities. Laryngoscope 2018, 128, 982–990. [CrossRef]

4. Mo, B.; Lindbaek, M.; Harris, S. Cochlear Implants and Quality of Life: A Prospective Study. Ear Hear. 2005, 26, 186–194.

[CrossRef]

5. Berger-Vachon, C.; Collet, L.; Djedou, B.; Morgon, A. Model for Understanding the Influence of Some Parameters in Cochlear Implantation. Ann. Otol. Rhinol. Laryngol. 1992, 101, 42–45. [CrossRef]

6. Shannon, R.V. Multichannel Electrical Stimulation of the Auditory Nerve in Man. I. Basic Psychophysics. Hear. Res. 1983, 11, 157–189. [CrossRef]

(17)

J. Clin. Med. 2021, 10, 679 16 of 17

8. Garnham, C.; O’Driscoll, M.; Ramsden And, R.; Saeed, S. Speech Understanding in Noise with a Med-El COMBI 40+ Cochlear

Implant Using Reduced Channel Sets. Ear Hear. 2002, 23, 540–552. [CrossRef]

9. Snel-Bongers, J.; Briaire, J.J.; Vanpoucke, F.J.; Frijns, J.H.M. Spread of Excitation and Channel Interaction in Single- and Dual-Electrode Cochlear Implant Stimulation. Ear Hear. 2012, 33, 367–376. [CrossRef]

10. Zeng, F.-G.; Rebscher, S.; Harrison, W.; Sun, X.; Feng, H. Cochlear Implants: System Design, Integration, and Evaluation. IEEE Rev. Biomed. Eng. 2008, 1, 115–142. [CrossRef]

11. de Jong, M.A.M.; Briaire, J.J.; Frijns, J.H.M. Dynamic Current Focusing: A Novel Approach to Loudness Coding in Cochlear

Implants. Ear Hear. 2019, 40, 34–44. [CrossRef] [PubMed]

12. DeVries, L.; Arenberg, J.G. Current Focusing to Reduce Channel Interaction for Distant Electrodes in Cochlear Implant Programs. Trends Hear. 2018, 22. [CrossRef]

13. Verschuur, C. Modeling the Effect of Channel Number and Interaction on Consonant Recognition in a Cochlear Implant

Peak-Picking Strategy. J. Acoust. Soc. Am. 2009, 125, 1723–1736. [CrossRef]

14. Cohen, L.T.; Richardson, L.M.; Saunders, E.; Cowan, R.S.C. Spatial Spread of Neural Excitation in Cochlear Implant Recipients: Comparison of Improved ECAP Method and Psychophysical Forward Masking. Hear. Res. 2003, 179, 72–87. [CrossRef] 15. Guevara, N.; Hoen, M.; Truy, E.; Gallego, S. A Cochlear Implant Performance Prognostic Test Based on Electrical Field Interactions

Evaluated by EABR (Electrical Auditory Brainstem Responses). PLoS ONE 2016, 11, e0155008. [CrossRef]

16. Spitzer, E.R.; Choi, S.; Hughes, M.L. The Effect of Stimulus Polarity on the Relation Between Pitch Ranking and ECAP Spread of Excitation in Cochlear Implant Users. J. Assoc. Res. Otolaryngol. 2019, 20, 279–290. [CrossRef] [PubMed]

17. Nelson, D.A.; Donaldson, G.S.; Kreft, H. Forward-Masked Spatial Tuning Curves in Cochlear Implant Users. J. Acoust. Soc. Am. 2008, 123, 1522–1543. [CrossRef]

18. DeVries, L.; Arenberg, J.G. Psychophysical Tuning Curves as a Correlate of Electrode Position in Cochlear Implant Listeners. J. Assoc. Res. Otolaryngol. 2018, 19, 571–587. [CrossRef]

19. Sek, A.; Alcántara, J.; Moore, B.C.J.; Kluk, K.; Wicher, A. Development of a Fast Method for Determining Psychophysical Tuning Curves. Int. J. Audiol. 2005, 44, 408–420. [CrossRef]

20. S˛ek, A.; Moore, B.C.J. Implementation of a Fast Method for Measuring Psychophysical Tuning Curves. Int. J. Audiol. 2011, 50, 237–242. [CrossRef]

21. Kreft, H.A.; DeVries, L.A.; Arenberg, J.G.; Oxenham, A.J. Comparing Rapid and Traditional Forward-Masked Spatial Tuning

Curves in Cochlear-Implant Users. Trends Hear. 2019, 23. [CrossRef]

22. Anderson, E.S.; Nelson, D.A.; Kreft, H.; Nelson, P.B.; Oxenham, A.J. Comparing Spatial Tuning Curves, Spectral Ripple Resolution, and Speech Perception in Cochlear Implant Users. J. Acoust. Soc. Am. 2011, 130, 364–375. [CrossRef]

23. Hughes, M.L.; Stille, L.J. Psychophysical versus Physiological Spatial Forward Masking and the Relation to Speech Perception in Cochlear Implants. Ear Hear. 2008, 29, 435–452. [CrossRef]

24. DeVries, L.; Scheperle, R.; Bierer, J.A. Assessing the Electrode-Neuron Interface with the Electrically Evoked Compound Action Potential, Electrode Position, and Behavioral Thresholds. J. Assoc. Res. Otolaryngol. 2016, 17, 237–252. [CrossRef]

25. Boëx, C.; Kós, M.-I.; Pelizzone, M. Forward Masking in Different Cochlear Implant Systems. J. Acoust. Soc. Am. 2003, 114,

2058–2065. [CrossRef]

26. Nelson, D.A.; Kreft, H.A.; Anderson, E.S.; Donaldson, G.S. Spatial Tuning Curves from Apical, Middle, and Basal Electrodes in Cochlear Implant Users. J. Acoust Soc. Am. 2011, 129, 3916–3933. [CrossRef]

27. Shannon, R.V.; Fu, Q.-J.; Galvin, J. The Number of Spectral Channels Required for Speech Recognition Depends on the Difficulty of the Listening Situation. Acta Otolaryngol. 2004, 124, 50–54. [CrossRef]

28. Dorman, M.F.; Loizou, P.C.; Fitzke, J.; Tu, Z. Recognition of Monosyllabic Words by Cochlear Implant Patients and by Normal-Hearing Subjects Listening to Words Processed through Cochlear Implant Signal Processing Strategies. Ann. Otol. Rhinol. Laryngol. 2000, 185, 64–66. [CrossRef]

29. Gnansia, D.; Péan, V.; Meyer, B.; Lorenzi, C. Effects of Spectral Smearing and Temporal Fine Structure Degradation on Speech Masking Release. J. Acoust. Soc. Am. 2009, 125, 4023–4033. [CrossRef]

30. Hopkins, K.; Moore, B.C.J. The Contribution of Temporal Fine Structure to the Intelligibility of Speech in Steady and Modulated Noise. J. Acoust. Soc. Am. 2009, 125, 442–446. [CrossRef] [PubMed]

31. Langner, F.; Jürgens, T. Forward-Masked Frequency Selectivity Improvements in Simulated and Actual Cochlear Implant Users Using a Preprocessing Algorithm. Trends Hear. 2016, 20. [CrossRef]

32. International Bureau for Audiophonology Recommendations 02/1 Audiometric Classification of Hearing Impairments. Available

online:https://www.biap.org/en/recommandations/recommendations/tc-02-classification(accessed on 20 May 2020).

33. DiNino, M.; Wright, R.A.; Winn, M.B.; Bierer, J.A. Vowel and Consonant Confusions from Spectrally Manipulated Stimuli

Designed to Simulate Poor Cochlear Implant Electrode-Neuron Interfaces. J. Acoust. Soc. Am. 2016, 140, 4404–4418. [CrossRef] [PubMed]

34. Jahn, K.N.; DiNino, M.; Arenberg, J.G. Reducing Simulated Channel Interaction Reveals Differences in Phoneme Identification Between Children and Adults with Normal Hearing. Ear Hear. 2019, 40, 295–311. [CrossRef]

(18)

36. Fournier, J.-E. Audiométrie Vocale: Les Epreuves D’intelligibilité et Leurs Applications au Diagnostic, à L’expertise et à la Correction Prothétique des Surdités; Maloine: Paris, France, 1951.

37. Lafon, J.C. Phonetic test, phonation, audition. JFORL J. Fr. Otorhinolaryngol. Audiophonol. Chir. Maxillofac. 1972, 21, 223–229. 38. Levitt, H. Transformed Up-down Methods in Psychoacoustics. J. Acoust. Soc. Am. 1971, 49, 467–477. [CrossRef]

39. Studebaker, G.A. A “Rationalized” Arcsine Transform. J. Speech Hear. Res. 1985, 28, 455–462. [CrossRef]

40. Sherbecoe, R.L.; Studebaker, G.A. Supplementary Formulas and Tables for Calculating and Interconverting Speech Recognition Scores in Transformed Arcsine Units. Int. J. Audiol. 2004, 43, 442–448. [CrossRef] [PubMed]

41. Bakdash, J.Z.; Marusich, L.R. Repeated Measures Correlation. Front. Psychol. 2017, 8, 456. [CrossRef] [PubMed]

42. Qin, M.K.; Oxenham, A.J. Effects of Simulated Cochlear-Implant Processing on Speech Reception in Fluctuating Maskers. J. Acoust. Soc. Am. 2003, 114, 446–454. [CrossRef]

43. Rosen, S.; Souza, P.; Ekelund, C.; Majeed, A.A. Listening to Speech in a Background of Other Talkers: Effects of Talker Number and Noise Vocoding. J. Acoust. Soc. Am. 2013, 133, 2431–2443. [CrossRef]

44. Stickney, G.S.; Zeng, F.-G.; Litovsky, R.; Assmann, P. Cochlear Implant Speech Recognition with Speech Maskers. J. Acoust. Soc. Am. 2004, 116, 1081–1091. [CrossRef]

45. Dorman, M.; Loizou, P. Speech Intelligibility as a Function of the Number of Channels of Stimulation for Normal-Hearing

Listeners and Patients with Cochlear Implants. Am. J. Otol. 1997, 18, S113–S114. [PubMed]

46. Loizou, P.C.; Dorman, M.; Tu, Z. On the Number of Channels Needed to Understand Speech. J. Acoust. Soc. Am. 1999, 106,

2097–2103. [CrossRef]

47. Berg, K.A.; Noble, J.H.; Dawant, B.M.; Dwyer, R.T.; Labadie, R.F.; Gifford, R.H. Speech Recognition as a Function of the Number of Channels in Perimodiolar Electrode Recipients. J. Acoust. Soc. Am. 2019, 145, 1556–1564. [CrossRef]

48. Croghan, N.B.H.; Duran, S.I.; Smith, Z.M. Re-Examining the Relationship between Number of Cochlear Implant Channels and

Maximal Speech Intelligibility. J. Acoust. Soc. Am. 2017, 142, EL537–EL543. [CrossRef]

49. Pals, C.; Sarampalis, A.; Baskent, D. Listening Effort with Cochlear Implant Simulations. J. Speech Lang. Hear. Res. 2013, 56, 1075–1084. [CrossRef]

50. Winn, M.B.; Edwards, J.R.; Litovsky, R.Y. The Impact of Auditory Spectral Resolution on Listening Effort Revealed by Pupil Dilation. Ear Hear. 2015, 36, e153–e165. [CrossRef]

51. Dorman, M.F.; Loizou, P.C.; Spahr, A.J.; Maloff, E. A Comparison of the Speech Understanding Provided by Acoustic Models

of Fixed-Channel and Channel-Picking Signal Processors for Cochlear Implants. J. Speech Lang. Hear. Res. 2002, 45, 783–788. [CrossRef]

52. Bingabr, M.; Espinoza-Varas, B.; Loizou, P.C. Simulating the Effect of Spread of Excitation in Cochlear Implants. Hear. Res. 2008, 241, 73–79. [CrossRef] [PubMed]

53. Bierer, J.A.; Litvak, L. Reducing Channel Interaction Through Cochlear Implant Programming May Improve Speech Perception:

Current Focusing and Channel Deactivation. Trends Hear. 2016, 20. [CrossRef] [PubMed]

54. Fu, Q.-J.; Nogaki, G. Noise Susceptibility of Cochlear Implant Users: The Role of Spectral Resolution and Smearing. JARO 2005, 6, 19–27. [CrossRef] [PubMed]

55. Gaudrain, E.; Ba¸skent, D. Discrimination of Voice Pitch and Vocal-Tract Length in Cochlear Implant Users. Ear Hear. 2018, 39, 226–237. [CrossRef]

56. Kluk, K.; Moore, B.C.J. Factors Affecting Psychophysical Tuning Curves for Normally Hearing Subjects. Hear. Res. 2004, 194, 118–134. [CrossRef] [PubMed]