HAL Id: hal-03234201
https://hal.archives-ouvertes.fr/hal-03234201
Submitted on 26 May 2021HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Models predicting the perceptual externalization of
sound sources
Robert Baumgartner, Piotr Majdak
To cite this version:
Models predicting the perceptual externalization of sound sources
Robert Baumgartner
1and Piotr Majdak
1 1 Acoustics Research Institute, Austrian Academy of Sciences, ViennaCorrespondence: robert.baumgartner@oeaw.ac.at
ABSTRACT
Perceptual externalization denotes the ability to associate sensations with external objects. In the auditory modality, sound sources are usually perceived externalized in natural listening situations whereas headphone playback often results in unrealistic sound images localized inside the listener’s head. Binaural synthesis, as commonly used in virtual and augmented reality systems, aims to reproduce sound sources as realistically as possible. An extensive evaluation of such systems is often rendered unfeasible whenever time-consuming listening tests become required. Prediction models can support the development process immensely at this stage and further deepen our understanding of the mechanisms underlying auditory externalization. We present functional models for predicting perceived externalization of static sound sources. The predictive performance of those models has been assessed against externalization ratings from several previous psychoacoustic experiments. The models were also used to evaluate the perceptual contribution of different acoustic features, enabling to discuss various potential combination strategies applied by the auditory system to infer spatial attributes of the environment.
1. INTRODUCTION
State-of-the-art audio reproduction systems all pursue the goal of making sound sources appear as realistic as possible. Spatial aspects play a central role here [1]. While in natural listening situations sound sources are usually perceptually well externalized, meaning that the listener can associate some distance to the hearing sensation and position it into the surrounding auditory environment. In contrast, headphone reproduction often results in an in-head localization. Meeting the listener’s expectations about the acoustic circumstances appears to be essential for externalization perception [2]. Here, we outline functional models for the prediction of externalization deficits [3], [4]. They evaluate expectation errors via a template matching procedure under the assumption of a static and thus particularly critical listening situation.
Previous psychoacoustic studies show that externalization perception is robust to spectral changes in interaural time differences (ITDs) but can be strongly affected by spectral modifications in interaural level differences (ILDs) and monaural spectral cues in the direct sound [5], [6]. Missing expectations on the reverberation of a sound can also degrade the degree of externalization. This phenomenon is also referred to as the room divergence effect [7].
2. MODELS
The structure of existing externalization models follows the principle of template matching: Characteristics of the incoming signal are compared with templates for the corresponding characteristics, resulting in an expectation error for every acoustic feature (see Figure 1). For the direct sound, spectral ILDs are evaluated as interaural cues and positive spectral gradient profiles as monaural cues [3]. Monaural cues are calculated separately for each ear and then added together with a binaural weighting that increases the contribution of the ipsilateral ear with increasing lateral angle [8]. Reverberation is evaluated by assessing the amount of temporal ITD fluctuations [4], [9]. The resulting expectation error is then mapped to the externalization measure with an exponentially decreasing function. The steepness of the mapping function is determined by a sensitivity parameter that can be individually adjusted for each feature.
Figure 1. Template-based model structure with
weighted combination of expectation errors before final mapping to externalization estimate.
For the adjustment of the sensitivity parameters, a prediction error is determined which corresponds to the effective value of the differences between the actual and predicted externalization measures. The optimal sensitivity parameters are obtained by minimizing the quadratic prediction error.
The externalization measures resulting from the three acoustic features are then weighted and added together. Previous evaluation results showed that combining expectation errors by a fixed weighted sum turned out to better resemble the results of many experiments as compared to using a dynamic selective procedure [3]. The weights are also chosen to minimize the quadratic prediction error and can only be fitted well to behavioral data if all the features are taken into account. For the direct sound cues, prediction errors were best if expectation errors were weighted with about 60% and those of interaural features with about 40%.
3. CONCLUSIONS
The degree to which auditory sensations are perceived externalized depends on how well acoustic features meet internal expectations. In particular, expectation errors
regarding interaural as well as monaural spectral cues and the amount of interaural temporal fluctuations seem to be crucial for externalization perception.
Template-based modeling approaches evaluating all those expectation errors and combining them with a rather fixed weighting (e.g., 60% monaural and 40% of interaural features in the case of free-field sounds) demonstrated high predictive performance. Expanded to general audio signals, such an approach may help in the future development of hearing devices enabling more realistic spatial sound reproduction.
4. ACKNOWLEDGMENTS
This work was supported by the Austrian Science Fund (FWF): project J 3803-N30.
5. REFERENCES
[1] P. Majdak, R. Baumgartner, and C. Jenny, “Formation of Three-Dimensional Auditory Space,” in The Technology of Binaural Understanding, J. Blauert and J. Braasch, Eds. Cham: Springer International Publishing, 2020, pp. 115–149.
[2] V. Best, R. Baumgartner, M. Lavandier, P. Majdak, and N. Kopčo, “Sound Externalization: A Review of Recent Research,” Trends Hear., vol. 24, p. 2331216520948390, Jan. 2020, doi: 10.1177/2331216520948390.
[3] R. Baumgartner and P. Majdak, “Decision making in auditory externalization perception,” bioRxiv, p. 2020.04.30.068817, May 2020, doi: 10.1101/2020.04.30.068817.
[4] S. Li, R. Baumgartner, and J. Peissig, “Modeling perceived externalization of a static, lateral sound image,” Acta Acust., vol. 4, no. 5, Art. no. 5, 2020, doi: 10.1051/aacus/2020020.
[5] W. M. Hartmann and A. Wittenberg, “On the externalization of sound images,” J Acoust Soc Am, vol. 99, no. 6, pp. 3678–88, Jun. 1996.
[6] H. G. Hassager, F. Gran, and T. Dau, “The role of spectral detail in the binaural transfer function on perceived externalization in a reverberant environment,” J. Acoust. Soc. Am., vol. 139, no. 5, pp. 2992–3000, May 2016, doi: 10.1121/1.4950847. [7] F. Klein, S. Werner, and T. Mayenfels, “Influences
of Training on Externalization of Binaural Synthesis in Situations of Room Divergence,” J. Audio Eng.
Soc., vol. 65, no. 3, pp. 178–187, Mar. 2017.
[8] P. Hofman and A. J. Van Opstal, “Binaural weighting of pinna cues in human sound localization,” Exp. Brain Res., vol. 148, no. 4, pp. 458–470, Feb. 2003, doi: 10.1007/s00221-002-1320-5.
[9] J. Catic, S. Santurette, J. M. Buchholz, F. Gran, and T. Dau, “The effect of interaural-level-difference fluctuations on the externalization of sound,” J.
Acoust. Soc. Am., vol. 134, no. 2, pp. 1232–1241,
Aug. 2013, doi: 10.1121/1.4812264.