HAL Id: hal-03235914
https://hal.archives-ouvertes.fr/hal-03235914
Submitted on 26 May 2021HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Psychoacoustic optimization of spatial sound quality
with hearing devices
Bernhard Seeber, Gabriel Gomez
To cite this version:
Psychoacoustic optimization of spatial sound quality
with hearing devices
Bernhard U. Seeber
Gabriel Gomez
Audio Information Processing, Dept. of Electrical and Computer Engineering Technical University of Munich, Germany
seeber@tum.de
ABSTRACT
Devices like hearing aids, hearables or noise cancellation headphones with hear-through option pick up the sound with microphones in the ear canal or behind the ear and present it to the listener after processing. Noise reduction and spatial beamforming processing will alter the intensity ratio between sources, the direct-to-reverberant ratio, interaural coherence and binaural cues, thus potentially affecting spatial perception of the sound scene. We investigated the effect of microphone location in behind-the-ear (BTE) satellites, in custom made in-the-ear (ITE) shells and of hin-the-earing device processing aimed at maintaining spatial perception while gaining the benefits of beamforming. The new Jackrabbit algorithm uses three microphones: one ITE-microphone to pick up ear signals optimal for spatial perception and two BTE-microphones to achieve delay-and-subtract beamforming. The two signals are combined and, using real-time processing, the approach was compared back-to-back with a range of standard approaches. Sound localization, externalization, distance perception, source width and speech understanding in noisy and speaker-alternating situations were investigated in virtual reverberant room conditions auralized with the Simulated Open Field Environment [SOFE v3; 1]. Results demonstrate that spatial perception and sound quality with the Jackrabbit algorithm are comparable to ITEs across a range of measures, while speech intelligibility in noise is substantially improved due to beamforming. The short article highlights how the psychoacoustic approach and measures of spatial sound quality were used to inspire the new signal processing method.
1. INTRODUCTION
The present article highlights findings of the PhD thesis of Gabriel Gomez [2], which dealt with assessing a range of aspects of spatial perception in situations with deprived spatial cues. The main application are hearing aids: There, sound is often picked up with microphones located behind the ear, which do not capture pinna-related monaural and binaural cues (Figure 1), but exhibit the more general interaural time (ITD) and level (ILD) differences that stem from the head shape and shoulder reflection. The lack of fine spectral detail along with the altered phase with BTE-devices are thought to cause issues with externalization, i.e. the perception of the spatial sound scene outside the head. Reduced
externalization can occur with highly correlated ear signals, e.g. those of a frontal sound source with little reverberation, or when monaural and binaural cues deviate from their expected combination [3]. BTE-devices possess two or more microphones which can be configured to filter the sound from one direction while suppressing sound from other directions. In doing so, they affect binaural cues such as the interaural correlation or ILD fluctuations, resulting in reduced externalization for static head positions [4]. The BTE-microphone position and the beamforming do not only affect externalization, but they change ILDs and the level of different sources and of reverberation as well. This can have various impacts on spatial perception regarding source location, distance and source width. With time variant processing these can change dynamically [5]. Aim of this project was not only to investigate these aspects, but also to develop approaches for compensating the loss of spatial fidelity due to beamforming while maintaining the benefit for speech perception.
2. METHODS 2.1 Jackrabbit Algorithm
The Jackrabbit algorithm aims to combine the benefits for spatial hearing obtained with ITE devices with the benefits for speech perception due to beamforming with BTE devices. It uses three microphones: one ITE microphone and two BTE microphones. The BTE microphones are configured according to standard delay-and-subtract beamforming (Figure). The target signal is estimated using a forward-facing beamformer while the interfering sound is estimated, e.g., with a backward-facing beamformer. The energy ratio of the output of both beamformers is used to generate a time-frequency-mask. The ITE-microphone signal is attenuated with this time-frequency-mask, i.e. time-frequency bins in which the
Figure 1: Sketch of Behind-The-Ear (BTE) and In-The-Ear (ITE) acoustic conditions where sound is being picked up by microphones either above the pinna in the concha and reproduced by the receivers in the device.
interferer dominates are attenuated while those with 0 dB SNR or better are passed unchanged [2, 6]. Since the ITE-signals are used as the basis, they carry the spectral information of the pinnas for each source plus their interaural cues in optimal fidelity.
2.2 Evaluation Experiments
A range of experiments were conducted to verify the working and perceptual outcome of the algorithm. We report from an experiment which assessed externalization with various microphone and beamforming conditions. Participants reported externalization ratings in a MUSHRA-like test by adjusting sliders in a GUI, see Figure 3. Stimuli were male and female speech, modulated noise, noise bursts and various noises with level emphasis or decrement in certain spectral bands plus a reference stimulus with partial binaural mirroring of the ear signals to yield an internalized percept. Stimuli were presented in a sound attenuated room via the 15° loudspeaker of the SOFE v3 [1], located in 1.3 m distance. The participants wore ITE and BTE shells linked to real-time processing of the microphone ear signals. Processed signals were played back to the participants with short latency via the receivers in the ITE. Head turns were only allowed during reference playback, permitting binaural ear signals congruent with the head turn to anchor to full externalization.
Figure 3: GUI for the MUSHRA-like procedure to obtain externalization ratings.
3. RESULTS AND CONCLUSIONS
Figure presents externalization ratings as median results across participants and stimuli. First, externalization ratings for mostly unprocessed ITE-microphone signals were at the loudspeaker, suggesting that normal distance
perception can be obtained with ITE signals. BTE microphone signals yielded less perceived distance, but the percept was still well externalized. Perceived distance reduced if beamforming is employed, with more beamforming leading to a percept near the head. The new Jackrabbit algorithm gave as good externalization as ITE signals, thus showing that beamforming and spatial perception benefits can be combined.
4. ACKNOWLEDGMENTS
This study was funded by Phonak (Sonova AG). BS and the SOFE were funded by the Bernstein Center for Computational Neuroscience Munich, BMBF 01 GQ 1004B. We thank Philipp von Unold who contributed to part of this work in his bachelor’s thesis.
5. REFERENCES
[1] B. U. Seeber, S. Kerber, and E. R. Hafter, “A System to Simulate and Reproduce Audio-Visual Environments for Spatial Hearing Research,”
Hearing Research, vol. 260, no. 1-2, pp. 1-10, 2010.
[2] G. Gomez, “Consolidating natural spatial perception and improved SNR in hearing aids: Jackrabbit, a new method,” Ph.D. thesis, Dept. of Electrical and Computer Engineering, Technical University of Munich, Munich, 2019.
[3] W. M. Hartmann, and A. Wittenberg, “On the externalization of sound images,” J. Acoust. Soc.
Am., vol. 99, no. 6, pp. 3678-3688, 1996.
[4] J. Catic, S. Santurette, and T. Dau, “The role of reverberation-related binaural cues in the externalization of speech,” J Acoust Soc Am, vol. 138, no. 2, pp. 1154-67, Aug, 2015.
[5] I. M. Wiggins, and B. U. Seeber, “Effects of dynamic-range compression on the spatial attributes of sounds in normal-hearing listeners,” Ear Hear, vol. 33, no. 3, pp. 399-410, May-Jun, 2012.
[6] G. Gomez, B. Seeber, and P. Derleth, Reducing
noise in the sound signal of a hearing device,
17209252.0-1210; 21.12.2017, EPO, 2017.
Figure 4: Externalization ratings median-averaged across a variety of stimuli with different spectral content and across participants, provided for different microphone (ITE, BTE omni) and beamforming conditions.
Figure 2: Basic beamformer configurations achievable with BTE microphones and used in the Jackrabbit method to estimate time-frequency masks.