Database of binaural room impulse responses of an apartment-like environment

(1)

HAL Id: hal-01969317

https://hal.laas.fr/hal-01969317

Submitted on 4 Jan 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Database of binaural room impulse responses of an

apartment-like environment

Fiete Winter, Hagen Wierstorf, Ariel Podlubne, Thomas Forgue, Jérôme

Manhes, Matthieu Herrb, Sascha Spors, Alexander Raake, Patrick Danès

To cite this version:

Fiete Winter, Hagen Wierstorf, Ariel Podlubne, Thomas Forgue, Jérôme Manhes, et al.. Database of

binaural room impulse responses of an apartment-like environment. Convention e-Brief of the Audio

Engineering Society, Jun 2016, Paris, France. 4p. �hal-01969317�

(2)

Convention e-Brief

Presented at the 140

th

Convention

2016 June 4–7, Paris, France

This Engineering Brief was selected on the basis of a submitted synopsis. The author is solely responsible for its presentation, and the AES takes no responsibility for the contents. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Audio Engineering Society.

Database of binaural room impulse responses of an

apartment-like environment

Fiete Winter1, Hagen Wierstorf2, Ariel Podlubne3, Thomas Forgue3, Jérôme Manhès3, Matthieu Herrb3, Sascha Spors1, Alexander Raake2, and Patrick Danès4

1_{Institute of Communications Engineering, University of Rostock, Rostock, D-18119, Germany} 2_{Audiovisual Technology Group, Technische Universität Ilmenau, Ilmenau, D-98693, Germany} 3_{LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France}

4_{LAAS-CNRS, Université de Toulouse, CNRS, UPS, Toulouse, France}

Correspondence should be addressed to Fiete Winter (fiete.winter@uni-rostock.de) ABSTRACT

We present a database of binaural room impulse responses (BRIRs) measured in an apartment-like environment. The BRIRs were captured at four different sound source positions, each combined with four listener positions. A head and torso simulator (HATS) with varying head-orientation in the range of ±78◦with 2◦resolution was used. Additionally, BRIRs of 20 listener positions along a trajectory connecting two of the four positions were measured, each with a fixed head-orientation. The data is provided in the Spatially Oriented Format for Acoustics (SOFA) and it is freely available under the Creative Commons (CC-BY-4.0) license. It can be used to simulate complex acoustic scenes in order to study the process of auditory scene analysis for humans and machines.

1 Introduction

One of the outstanding capabilities of the human audi-tory system is to recover information on single audiaudi-tory objects out of a mixture of sounds [1]. The EU FET-OPEN project TWO!EARSaims at developing a com-putational model that mimics this behavior. Listeners are regarded as multi-modal agents that develop their concept of the world by active, exploratory sensing. In the course of this process, they interpret percepts, applying existing and collecting new knowledge and concepts accordingly. Consequently, the TWO!EARS model involves bottom-up (signal-driven) as well as top-down (hypothesis-driven) processes.

A first open-source release of the model software is available for download [2]. It will be complemented

until the end of the project so as to provide a com-prehensive set of functions enabling new perspectives, including the neighbouring field of robot audition. The model is deployed on a binaural robot, in order to assess its suitability in real dynamic auditory scene analysis (DASA) scenarios. In addition, a software simulation stage of the robot is provided to foster the development process.

The synthesis of ear signals is thus an important basis for the development and evaluation of the TWO!EARS model. It allows to generate reproducible conditions in contrast to slightly varying signals from real-world scenarios. Binaural synthesis using Head-Related Im-pulse Responses (HRIRs) for anechoic environments and Binaural Room Impulse Responses (BRIRs) for reverberant conditions is a versatile tool to generate

(3)

Winter et al. BRIR database of an apartment-like environment

Fig. 1: ADREAM Laboratory

ear signals. For the latter, a-priori measured BRIRs de-scribing the acoustic sound propagation from a sound source to the ears of a human listener are needed. This engineering brief provides detailed information about the measurement process for the BRIRs of a complex, apartment-like environment. It is organised as follows: The details of the measurement process are described in Sec. 2. A brief data analysis is presented in Sec. 3 followed by information about accessibility of measured data given in Sec. 4.

2 Measurement Details

The impulse response measurements were conducted in the G. Giralt building of the Laboratory for Analysis and Architecture of Systems (LAAS-CNRS), Toulouse, France, which hosts the Reconfigurable Dynamic Ar-chitectures for Embedded Autonomous Mobile Sys-tems (ADREAM) project. As depicted in Fig. 1, this environment constitutes a drywall installation built into in a big hall, representing an apartment with two closed rooms (L × W × H = 4 × 4 × 2.5 m and 3 × 4 × 2.5 m) and one half-open area (4 × 4 × 2.5 m). The whole apartment has no ceiling.

2.1 Measurement Setup

The involved hardware and the data connections be-tween the devices are depicted in Fig. 2. The BRIRs were measured using logarithmic sweeps of 219 sam-ples length at a sampling rate of 44.1 kHz. The measurement signals are D/A-converted by a RME FIREFACE UC USB-soundcard and emitted by one

PC Robotics Platform (Jido) Sound Card Optical Tracking System HATS (KEMAR) Loudspeakers USB audio E T H E R N E T contr ol commands E T H E R N E T location data SERIAL head azimuth X L R / B N C audio X L R audio

Fig. 2: Involved hardware for the measurement includ-ing data flows between the components. The type of the respective physical data connections is written in capital letters, while the data con-tent is marked in bold font.

out of four Genelec 8020A two-way active loudspeak-ers. Simultaneously, the emitted sound is measured using a Head and Torso Simulator (HATS), namely the G.R.A.S. Knowles Electronics Manikin for Acous-tic Research (KEMAR) 45BB-4 [3] with Large Pin-nae (KB0091 + KB0090). The length of the BRIRs was limited to 2 seconds and the delay of the mea-surement chain was compensated a-priori. The HATS was mounted on a mobile robotics platform, named Jido (see Fig. 3a). The manikin is furthermore en-dowed with a servo motor to enable azimuthal head rotations. The motor is associated to a gearhead (ra-tio of 9) and a quadrature incremental encoder with 2048 steps (4 pulses by step), so that the 73728 pulses per revolution ensure an accuracy of 4.88 × 10−3 de-grees. It is interfaced via a serial connection through an ELMO controller that manages the sensors (Hall effect sensors, encoders and limit sensors) and controls the motor in position or velocity mode. The setpoints are sent through a standard CAN serial link.

AES 140th_{Convention, Paris, France, 2016 June 4–7} Page 2 of 4

(4)

(a) (b)

Fig. 3: (a) Jido mobile robot with the KEMAR mounted on its top. (b) Genelec 8020A Loud-speaker (×4). Both devices were endowed with optical tracking markers (small gray balls). The location1_{of the robotics platform and loudspeakers} are captured by an optical tracking system composed of 28 infrared cameras. It uses small optical markers made with a retroreflective material (cf. Fig. 3) mounted on the respective devices. Its accuracy is in the millimeter range.

2.2 Measurement Positions

In the first part of the measurement, BRIRs were cap-tured for four different loudspeaker locations, each combined with four listener locations with a head-above-torso-orientation varying in the range of ±78◦ with 2◦resolution (see Fig. 5a). Hereby, a negative an-gle corresponds to turning the head to the right above the torso. While the loudspeakers remained steady over the whole measurement, the HATS had to be moved to the next location by operating the robot manually. As shown in Fig. 5c, the first loudspeaker was oriented towards a wall in order to create a case, where the direct sound has a lower amplitude than the first reflection, due to the directivity of the loudspeaker.

Within the second part, the listener was gradually moved along a trajectory connecting the measurement locations 1 and 2 from the first part (see Fig. 5b). BRIRs for each loudspeaker were measured for 0◦ 1_{location subsumes position and orientation within this treatise}

−60◦ −40◦ −20◦ 0◦ 20◦ 40◦ 60◦ 5 10 15 20 25 30 head-orientation time / ms 5 10 15 20 25 30 time / ms Left Ear 0 dB −60 −45 −30 −15 Right Ear

(a) Loudspeaker 1, Listener Position 1

5 10 15 20 10 15 20 25 30 position on trajectory time / ms 10 15 20 25 30 time / ms Left Ear _{0 dB} −60 −45 −30 −15 Right Ear (b) Loudspeaker 2, Trajectory Fig. 4: Magnitude of measured BRIRs head-orientation every ≈ 25 cm on the trajectory. The loudspeaker positions remained the same compared to part one. During the whole measurement, that is part one and two, the individual sound pressure levels of all four loudspeakers were kept constant.

3 Data Analysis

A small excerpt of the measured impulse responses is shown in Fig. 4 in order to illustrate how the complex environment is affecting the structure of the BRIRs. The directivity of the loudspeakers and the orientation of the first loudspeaker towards the wall leads to the expected strong reflection, which is considerably higher than the direct sound (cf. 4a). The influence of the listener’s head is also clearly visible, as the amplitude of the reflection is reduced for the contra-lateral ear. As the listener "moves" along the trajectory and passes the doorframe connecting both closed rooms (cf. location 11 to 16 in Fig. 5b), loudspeaker 2 is no longer occluded by the walls and the direct sound for the right ear (cf. Fig. 4b) significantly increases.

(5)

Winter et al. BRIR database of an apartment-like environment x y ϕ 1.0 m 1.0 m 1 2 3 4 1 2 3 4 1: (2.55, 7.47) m, 4◦ 2: (−3.66, 7.15) m, 0◦ 3: (−3.30, 5.09) m, -5◦ 4: (−1.44, 2.98) m, 43◦ 1: (2.37, 9.96) m, -106◦ 2: (0.88, 4.70) m, -166◦ 3: (−1.17, 4.57) m, -157◦ 4: (−0.14, 6.91) m, 139◦

(a) 4 listener positions with varying head-above-torso-orientation x y ϕ 1.0 m 1.0 m 1 2 3 4 1: (2.55, 7.47) m, 4◦ 2: (−3.66, 7.15) m, 0◦ 3: (−3.30, 5.09) m, -5◦ 4: (−1.44, 2.98) m, 43◦ 1: (2.54, 9.66) m, -105◦ 2: (2.49, 9.56) m, -105◦ 3: (2.32, 9.23) m, -104◦ 4: (2.10, 8.82) m, -103◦ 5: (1.97, 8.55) m, -103◦ 6: (1.87, 8.34) m, -103◦ 7: (1.76, 8.14) m, -102◦ 8: (1.64, 7.91) m, -102◦ 9: (1.51, 7.68) m, -101◦ 10: (1.39, 7.45) m, -101◦ 11: (1.31, 7.25) m, -100◦ 12: (1.20, 7.01) m, -100◦ 13: (1.10, 6.80) m, -99◦ 14: (1.00, 6.58) m, -99◦ 15: (0.88, 6.32) m, -98◦ 16: (0.81, 6.10) m, -98◦ 17: (0.78, 5.85) m, -82◦ 18: (0.89, 5.33) m, -80◦ 19: (1.05, 4.85) m, -78◦ 20: (1.12, 4.63) m, -76◦ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

(b) Listener trajectory with 0◦ head-above-torso-orientation

(c) Loudspeaker 1 and HATS at listener posi-tion 1

(d) Loudspeaker 2, 3, and 4 and HATS at lis-tener position 3

Fig. 5: Fig. (a) and (b) show the xy-position and horizontal orientation of the listener’s torso drawn in black. Respective quantities for the four loudspeakers are drawn in red. The legend shows the respective position followed by the orientation’s azimuth angle ϕ. In addition, an arc around the listener symbol is used in (a) to illustrate the varying head-above-torso-orientation in the range of ±78◦. Fig. (c) and (d) show exemplary combinations of the listener and loudspeaker locations from (a).

4 BRIR Database

All measured BRIRs are stored in the Spatially Ori-ented Format for Acoustics (SOFA), which is the stan-dard format for spatial acoustic data of the Audio En-gineering Society [4]. The data accompanied by pho-tos from the setup is freely available at [5] and is li-censed under Creative Commons Attribution 4.0 (CC BY 4.0)2.

5 Acknowledgements

This research has been supported by EU FET grant TWO!EARS, ICT-618075.

References

[1] Cherry, E. C., “Some experiments on the recogni-tion of speech, with one and with 2 ears,” J. Acoust. Soc. Am., 25, pp. 975–979, 1953.

2_{http://creativecommons.org/licenses/by/4.0/}

[2] Two!Ears Project, “Two!Ears Auditory Model 1.2,” 2016, doi:10.5281/zenodo.47487.

[3] G.R.A.S. Sound & Vibration, “Instruction Manual - G.R.A.S. 45BB KEMAR Head and Torso / 45BC KEMAR Head and Torso with Mouth Simulator,” 2016.

[4] Audio Engineering Society, Inc., “AES692015 -AES standard for file exchange - Spatial acoustic data file format,” 2015.

[5] Winter, F., Hagen, W., Podlubne, A., Forgue, T., Manhès, J., Herrb, M., Spors, S., Raake, A., and Danès, P., “Binaural room impulse responses of an apartment- like environment,” 2016, doi:10.5281/ zenodo.49357.

AES 140th_{Convention, Paris, France, 2016 June 4–7} Page 4 of 4