UNIVERSITÉ DE SHERBROOKE
Faculté de génie
Département de génie mécanique
Localisation tridimensionnelle et estimation
de la directivité de sources sonores avec
l'imagerie acoustique par l'ajustement de
faisceaux
Tridimensional localization and directivity reconstruction of
sound sources using the acoustic imaging structure from motion
Thèse de doctorat
Specialité : génie mécanique
Lucas Henrique TEIXEIRA CARNEIRO
Jury: Alain BERRY (directeur)
Jean - François BLAIS (Bombardier)
Patrice MASSON (GAUS)
Philippe - Aubert GAUTHIER (GAUS)
Quentin LECLÈRE (INSA - Lyon)
Thomas PADOIS (ÉTS)
RÉSUMÉ
Les systèmes d'imagerie acoustique basés sur des antennes microphoniques ont de larges applications dans la localisation et le diagnostic de sources sonores. Cependant, bien que le rayonnement sonore soit tridimensionnel, la plupart des systèmes de diagnostic basés sur des images acoustiques sont stationnaires dans l'espace. C'est-à-dire qu'ils ne peuvent pas fusionner les informations obtenues à partir de plusieurs positions de mesure. Le diag-nostic nal peut être peu able ou incomplet. Dans cette thèse, une méthode numérique et de mesure pour la localisation tridimensionnelle et la reconstruction de plusieurs sources sonores stationnaires appelée imagerie acoustique par l'ajustement de faisceaux (AISFM, en anglais) est proposée, développée et validée. La méthode repose sur la minimisation de l'erreur de reprojection de la localisation tridimensionnelle nale des sources sonores sur des images acoustiques capturées à partir de plusieurs positions, avec ou sans connaissance de ces positions, an de fournir la meilleure reconstruction tridimensionnelle d'artefacts. Un problème de correspondance d'image acoustique doit être résolu au préalable. En uti-lisant la localisation des sources et d'autres informations extraites des images acoustiques, la méthode fournit également l'amplitude et la directivité des sources sonores. La méthode propage en arrière les incertitudes de localisation des sources sonores sur chaque image an qu'un écart-type de localisation tridimensionnelle pour chaque source puisse être calculé et corrélé à son extension et sa directivité. La méthode AISFM est testée numérique-ment et expérinumérique-mentalenumérique-ment. Une méthode de localisation de source sonore non-supervisée basée sur le principe de segmentation d'image couplée à un calcul automatique des mé-triques de l'image acoustique est développée an de fournir à l'AISFM les informations nécessaires. Deux méthodes de segmentation sont utilisées sur les images acoustiques : les méthodes d'Otsu et de Bradley. Des paramètres optimaux sont dénis pour les meilleures performances de localisation et les résultats sont comparés numériquement et expérimen-talement. Les métriques sont calculées en interpolant les images acoustiques sur une grille polaire an d'éliminer tout biais directionnel dans leur évaluation. Des images acoustiques sont générées à partir de l'algorithme de formation de voies GCC-PHAT avec des signaux acoustiques capturés avec une antenne microphonique sphérique. La position des micro-phones sur cette antenne est optimisée à l'aide de deux critères géométriques et d'un algorithme génétique. Les images obtenues présentent des largeurs de lobes principaux plus grandes, des niveaux de lobes latéraux plus petits et de meilleures performances avec les méthodes de segmentation d'image proposées.
Mots-clés : Imagerie acoustique, formation de voies, fusion de données, réseaux de capteurs distribués, vision computationnelle, segmentation de l'image acoustique, correspondance d'images acoustiques, ajustement de faisceaux
ABSTRACT
Acoustic imaging systems based on microphone arrays have wide applications in the local-ization and diagnosis of sound sources. However, while sound radiation is tridimensional and path-dependent, most diagnosis systems based on acoustic images are spatially sta-tionary i.e., they are unable to fuse information obtained from multiple capture positions. The nal diagnosis may be unreliable or incomplete. In this thesis it is proposed, developed and validated a numerical and measurement method for the tridimensional localization and reconstruction of multiple stationary sound sources called acoustic imaging structure from motion (AISFM). It relies on the minimization of the re-projection error of the nal tridi-mensional localization of the sound sources on acoustic images captured from multiple positions, with or without knowledge of the positions of the captured images, in order to deliver the best tridimensional reconstruction of artifacts in a least-squares sense. An acoustic image correspondence problem needs to be solved beforehand. Using the sources' localization and other informations extracted from the acoustic images, the method also delivers the amplitude and directivity pattern of the sound sources. It back propagates the uncertainties of sound sources localizations on each image so that a standard deviation for the tridimensional localization of each source may be computed and correlated to its extension and directivity. The AISFM method is tested and benchmarked numerically and experimentally. An unsupervised sound source localization method based on the image segmentation principle coupled to automatic image metrics computation are developed in order to feed the AISFM with the necessary informations using a trouble-free work-ow. Two segmentation methods are used on the acoustic images: Otsus' and Bradleys' methods. Optimal parameters are dened for the best localization performance and the results are benchmarked numerically and experimentally. The metrics are computed in-terpolating the acoustic images to a polar grid in order to eliminate any directional bias in their evaluation. The acoustic images are generated using the GCC-PHAT beamforming algorithm with acoustic signals captured with a spherical microphone array. The position of the microphones on this array is optimized using two geometric criteria and a genetic algorithm. The obtained images are claimed to present larger main-lobes widths, smaller side-lobes levels and to perform better with the proposed image segmentation methods.
Keywords: Acoustic imaging, beamforming, sensor fusion, distributed sensing net-work, computer vision, acoustic image segmentation, acoustic image correspon-dence, SFM
TABLE OF CONTENTS
1 INTRODUCTION 1 1.1 General context . . . 1 1.2 Project denition . . . 13 1.3 Objectives . . . 15 1.4 Original contributions . . . 161.5 Methodology and document organization . . . 17
2 STATE-OF-THE-ART AND METHODS 21 2.1 Fast-forward theory of sound propagation . . . 21
2.1.1 Harmonic solution of the acoustic wave equation . . . 21
2.1.2 Fundamental sources . . . 23
2.1.3 Multi-polar expansion of a generic acoustic source . . . 26
2.1.4 Radiation of extended sources . . . 28
2.1.5 Vibroacoustic model of a panel . . . 29
2.1.6 Sound propagation in real roomconditions . . . 32
2.1.7 Statistical properties of stochastic signals . . . 35
2.2 Beamforming . . . 37
2.2.1 Delay-and-sum beamforming (DASB) . . . 37
2.2.2 Steered response power (SRP) and adaptive beamforming . . . 40
2.2.3 Generalized cross-correlation (GCC) . . . 43
2.2.4 Spatial likelihood function (SLF) . . . 44
2.2.5 MVDR/GCC optimality as a maximum likelihood estimator (MLE) 46 2.3 Acoustic imaging . . . 50
2.3.1 Image projection . . . 51
2.3.2 Acoustic image generation . . . 52
2.3.3 Source localization . . . 55
2.3.4 Metrics . . . 58
2.4 Structure from motion (SFM) for tridimensional sound source localization and reconstruction . . . 63
2.4.1 The acoustic triplet, the triangularization principle and principles in multiple view geometry . . . 63
2.4.2 SFM cost function: minimizing the image re-projection error with a non-linear least-squares function . . . 68
2.4.3 Statistical interpretation of the least-squares minimization . . . 71
2.4.4 Non-linear minimization using Levenberg-Marquardt . . . 73
2.4.5 Generalization of the SFM matrix formulation and Jacobians denition 74 2.4.6 Backward propagation of observed projections' covariance . . . 77
2.4.7 Parametrization of motion orientation . . . 78
2.4.8 Image correspondence problem . . . 79
2.5 Sound source directivity reconstruction . . . 85
2.6 Conclusion . . . 87 v
vi TABLE OF CONTENTS 3 MICROPHONE ARRAY OPTIMIZATION AND ACOUSTIC IMAGE
ANALYSIS 89
3.1 Geometric criteria for microphone arrays using GCC-PHAT . . . 89
3.1.1 Geometric hypothesis and criteria validation . . . 91
3.2 Optimization using genetic algorithms (spherical arrays) . . . 92
3.2.1 Overview of the algorithm . . . 92
3.2.2 Implementation of the optimization problem . . . 93
3.2.3 Repeatability and convergence analysis of the optimization schemes 97 3.3 Analysis of the acoustic image . . . 102
3.4 Segmentation and source localization on the acoustic image . . . 102
3.4.1 Implementation and validation of segmentation and source localiza-tion methods on acoustic images . . . 103
3.4.2 Benchmark for localization of multiple sources . . . 106
3.4.3 Benchmark for source localization with sources of dierent levels . . 109
3.4.4 Benchmark for source localization with limited-band sources . . . . 112
3.5 Evaluation of the acoustic image . . . 113
3.5.1 Monopole . . . 115
3.5.2 Eect of limited-band monopole sources on the acoustic image metrics120 3.5.3 Dipole . . . 123
3.5.4 Extended source . . . 129
3.6 Experimental validation . . . 136
3.6.1 Calibration . . . 138
3.6.2 Metrics evaluation for one monopole . . . 145
3.6.3 Metrics evaluation for two monopoles . . . 151
3.6.4 Segmentation and localization for two monopoles . . . 156
3.7 Conclusions . . . 157
4 TRIDIMENSIONAL MULTIPLE SOUND SOURCE LOCALIZATION AND DIRECTIVITY RECONSTRUCTION USING THE AISFM 163 4.1 Denition of the numerical and experimental test rigs . . . 163
4.2 Numerical validation . . . 170
4.2.1 Automatic sound source detection, features extraction and acoustic image correspondence with microphone array motion information: four monopoles + dipole . . . 173
4.2.2 Automatic sound source detection, features extraction and acoustic image correspondence with microphone array motion information: four monopoles + panel . . . 176
4.2.3 AISFM with microphone array motion information: four monopoles + dipole . . . 180
4.2.4 AISFM without microphone array motion information: four monopoles + dipole . . . 183
4.2.5 AISFM with microphone array motion information: four monopoles + dipole panel . . . 186
4.2.6 AISFM without microphone array motion information: four monopoles + one panel . . . 189
TABLE OF CONTENTS vii 4.2.7 Numerical directivity reconstruction: monopoles +dipole . . . 191 4.3 Experimental validation . . . 196
4.3.1 Automatic sound source detection, features extraction and acoustic image correspondence with microphone array motion information: four monopoles +dipole . . . 197 4.3.2 AISFM with microphone array motion information: four monopoles
+ dipole . . . 199 4.3.3 AISFM without microphone array motion information: four monopoles
+ dipole dipole . . . 203 4.3.4 Experimental directivity reconstruction: four monopoles +dipole . 209 4.4 Conclusions . . . 210
5 CONCLUSION 215
5.1 Originality and main results . . . 215 5.2 Future works . . . 216
6 CONCLUSION - VERSION EN FRANÇAIS 219
6.1 Originalité et résultats principaux . . . 219 6.2 Travaux futurs . . . 220 A EQUATIONS FOR MULTI-POLAR EXPANSION OF A GENERIC
ACOUSTIC SOURCE 223
B THE MAXIMUM LIKELIHOOD ESTIMATION METHOD 225
C AUTOMATIC THRESHOLDING METHODS FOR SEGMENTATION227 D COMPUTATION OF ACOUSTIC IMAGE METRICS FOR PLANAR
PROJECTION 231
E JACOBIANS FOR THE SFM PRIMARY STUCTURE USING
SPHER-ICAL AND PLANAR PROJECTION MODELS 233
F ROBUST HOMOGRAPHY AND FUNDAMENTAL MATRIX
ESTI-MATION FOR PAIRWISE IMAGE CORRESPONDENCE 237
G FUNCTIONS FOR DISCRETE GA MULTI-OBJECTIVE
OPTIMIZA-TION 243
H MICROPHONES DISTRIBUTIONS ON SPHERICAL MICROPHONE
ARRAYS 249
I EFFECT OF ACOUSTICAL PARAMETERS IN THE LOCALIZATION
OF SOUND SOURCES ON AN ACOUSTIC IMAGE 251
I.0.1 Eect of correlated environmental noise and microphone noise on the localization method . . . 251 I.0.2 Eect of room reverberation on the localization method performance 252
viii TABLE OF CONTENTS J EFFECT OF ACOUSTICAL PARAMETERS IN THE METRICS OF
AN ACOUSTIC IMAGE 255
J.0.1 Eect of correlated environmental noise on the localization method 255
J.0.2Eect of room reverberation on the acoustic image of a monopole . 256
K SPHERE SCATTERING AND COMPENSATION 259
LIST OF FIGURES
1.1 Intensity probes . . . 5
1.2 Commercial microphone array systems with optical camera . . . 6
1.3 A B&K microphone array system . . . 7
1.4 Distributed sound source localization using a network of circular micro-phone arrays . . . 11
2.1 Spherical and planar wave models . . . 23
2.2 Fundamental sources . . . 25
2.3 Scheme for the Rayleigh integral and a baed panel . . . 30
2.4 Image source model (ISM) . . . 33
2.5 Room Impulse Response (RIR) . . . 35
2.6 Delay-and-sum beamforming scheme . . . 38
2.7 Time-delay computation . . . 40
2.8 Comparison between MPDR and SRP beamformers . . . 41
2.9 Conceptual representation of a spatial likelihood function . . . 45
2.10 Image projection models for acoustic cameras . . . 52
2.11 Acoustic image generation . . . 54
2.12 Arithmetic, geometric and harmonic means for acoustic image generation using the GCC-PHAT . . . 55
2.13 H-minimna lter . . . 57
2.14 Peak-nding pseudo-algorithm . . . 57
2.15 Denition for main-lobe width (MLW) and maximum side-lobe level (SLL) 60 2.16 Grid interpolation for a spherical projection . . . 61
2.17 Fundamental/conceptual comparison between dierent image metrics . . . 64
2.18 The triangularization principle in 2D . . . 65
2.19 The epipolar geometry of a stereo setup using planar projections . . . 67
2.20 The re-projection error . . . 68
2.21 SFM network graph, parameters connectivity, Jacobian and Hessian struc-tu res . . . 78
2.22 Image correspondence using motion information . . . 82
2.23 Cubic rectication of a spherical image projection . . . 84
2.24 Sound source directivity reconstruction work-ow . . . 86
3.1 Design of a circular microphone array . . . 92
3.2 3D printed spherical support for microphone array . . . 96
3.3 Concepts in multi-objective optimization . . . 96
3.4 Flowchart of the microphone array optimization routine . . . 98
3.5 Histograms for the repeatability and convergence analysis of a mono - ob-jective microphone array optimization . . . 99
3.6 Pareto front of a multi-objective microphone array optimization . . . 100 ix
x LIST OF FIGURES 3.7 Geometric properties of selected sets of microphone distributions according
to D and Γ criteria . . . 101
3.8 Flowchart of the acoustic image analysis . . . 102
3.9 Sound source localization using dierent segmentation strategies with GCC-PHAT-GM images . . . 104
3.10 Sound source localization using dierent segmentation strategies with GCC-PHAT-GM images . . . 105
3.11 Pixel histogram of GCC-PHAT AM/GM gray scale acoustic images . . . . 106
3.12 Multiple sound source localization with Otsus' and Bradleys' segmentation methods . . . 108
3.13 Setup of localization benchmark for sound sources with dierent levels . . . 110
3.14 Setup of localization benchmark for sound sources with dierent frequency content . . . 114
3.15 Metrics of a GCC-PHAT-AM acoustic image with a regular microphone array distribution and a monopole source . . . 116
3.16 Metrics of a GCC-PHAT-AM acoustic image with a multi-objective micro-phone array distribution and a monopole source . . . 118
3.17 Metrics of a GCC-PHAT-GM acoustic image with a multi-objective micro-phone array distribution and a monopole source . . . 119
3.18 GCC-PHAT-GM acoustic images with a microphone array Set 4a (D − Γ) and two monopoles . . . 121
3.18 GCC-PHAT-GM acoustic images with a microphone array Set 4a (D − Γ) and two monopoles . . . 122
3.19 SLL polar images with microphone array Set 4a and two monopoles with dierent frequency contents obtained with GCC-PHAT-GM . . . 123
3.20 Metrics of an acoustic image for a dipole source oriented towards the mi-crophone array . . . 124
3.20 Metrics of an acoustic image for a dipole source oriented towards the mi-crophone array . . . 125
3.21 Metrics of an acoustic image for a dipole source with the dipole pointing parallel to the microphone array . . . 126
3.21 Metrics of an acoustic image for a dipole source with the dipole pointing parallel to the microphone array . . . 127
3.22 Mean quadratic velocity of vibratingpanels . . . 130
3.23 Radiated acoustic power and radiation eciency of vibratingpanels . . . . 132
3.24 Directivity patterns for several vibratingpanels . . . 133
3.25 GCC-AM images of vibrating panels . . . 135
3.26 GCC-PHAT-AM images of vibrating panels . . . 136
3.27 Experimental setup in an anechoic chamber . . . 137 3.28 Inuence of the microphones amplitude mismatch on the acoustic image of
a regular microphone array Set 1a (regular) obtained with GCC-PHAT-AM 139 3.29 Inuence of the microphone phase mismatch on the acoustic image of a
regular microphone array Set 1a (regular) obtained with GCC-PHAT-AM . 140 3.30 Inuence of the microphone position mismatch on the acoustic image of a
LIST OF FIGURES xi 3.31 Measurement setup of a spherical microphone array usinga Linearis3D
photogrammetry system . . . 142
3.32 Sound pressure amplitude error on a spherical support due to scattering obtained analytically . . . 143
3.33 Sound pressure phase error on a spherical support due to scatteringobtained analytically . . . 144
3.34 Scatteringinuence of a spherical support on the numerical acoustic image (GCC-PHAT-AM) of a microphone array . . . 146
3.35 Numerical and experimental GCC-PHAT-AM acoustic images for discrete microphone array congurations . . . 147
3.36 Numerical and experimental GCC-PHAT-AM acoustic images metrics for several discrete microphone congurations . . . 148
3.36 Numerical and experimental GCC-PHAT-AM acoustic images metrics for several discrete microphone congurations . . . 149
3.37 GCC-PHAT-GM images for two monopoles with similar amplitudes . . . . 152
3.38 Metrics for GCC-PHAT-AM images with two sources of unequal levels . . 153
3.39 Metrics for GCC-PHAT-GM images with two sources of unequal levels . . 155
3.40 Acoustic images, segmentation and localization for some discrete micro-phone array distributions . . . 157
3.40 Acoustic images, segmentation and localization for some discrete micro-phone array distributions . . . 158
4.1 User cases for sound source localization and directivity reconstruction . . . 164
4.2 Reconstruction setup in an anechoic chamber with ve sound sources and 15 microphone array positions . . . 165
4.3 Artifacts positioningmeasurement and tracking . . . 167
4.4 Panoramic distribution of sound sources and microphone arrays . . . 168
4.5 Experimental polar pattern of sound sources . . . 171
4.6 Acoustic image correspondence pseudo-algorithm . . . 172
4.7 38 antenna positions simulation rays map and correspondence results . . . 174
4.8 Localization and levels extraction on 38 antenna positions from a numerical simulation . . . 175
4.9 Correspondence and metrics extraction on 38 antenna positions from a nu-merical simulation . . . 176
4.10 16 antenna positions simulation rays map and correspondence results . . . 177
4.11 Rays map and correspondence results for four monopoles and a vibrating panel . . . 178
4.12 Localization and levels extraction of an extended sound source reconstruc-tion setup . . . 179
4.13 Correspondence and metrics extraction of an extended sound source recon-struction setup . . . 180
4.14 38 antenna positions structure reconstruction - source localization convergence181 4.15 38 antenna positions structure reconstitution - standard deviation convergence182 4.16 38 antenna positions convergence statistics . . . 184 4.17 16 antenna positions structure reconstruction - source localization convergence185
xii LIST OF FIGURES 4.18 16 antenna positions motion reconstruction - microphone array localization
convergence . . . 186 4.19 16 antenna positions motion reconstruction - microphone array orientation
convergence . . . 187 4.20 16 antenna positions structure reconstitution - standard deviation convergence188 4.21 16 antenna positions convergence statistics . . . 189 4.22 Extended sound source structure reconstruction - source localization
con-vergence . . . 190 4.23 Extended sound source structure reconstruction - standard deviation
con-vergence . . . 191 4.24 Extended sound source convergence statistics . . . 192 4.25 Extended sound source motion/structure reconstruction - source
localiza-tion convergence . . . 192 4.26 Extended sound source motion/structure reconstruction - microphone array
localization convergence . . . 193 4.27 Extended sound source motion/structure reconstruction - microphone array
orientation convergence . . . 193 4.28 Extended sound source motion/structure reconstruction - standard
devia-tion convergence . . . 194 4.29 Extended sound source motion/structure reconstruction convergence statistics195 4.30 Sound sources frequentialamplitude estimation . . . 196 4.31 Sound source polar patterns with the distance equalization method . . . . 197 4.32 Numericaland experimentalray maps . . . 198 4.33 Numericaland experimentalcorrespondence results . . . 200 4.34 Localization and levels extraction of numerical and experimental setups . . 201 4.35 Correspondence and metrics extraction for numericaland experimentalsetups202 4.36 Structure reconstruction for numericaland experimentalsetups - source
localization convergence . . . 203 4.37 Structure reconstruction for numericaland experimentalsetups - standard
deviation convergence . . . 204 4.38 Convergence statistics for numericaland experimentalsetups . . . 205 4.39 Motion/structure reconstruction for numericaland experimentalsetups
-source localization convergence . . . 206 4.40 Motion/structure reconstruction for numericaland experimentalsetups
-standard deviation convergence . . . 207 4.41 Motion/structure reconstruction for numericaland experimentalsetups
-microphone array localization convergence . . . 208 4.42 Motion/structure reconstruction for numericaland experimentalsetups
-microphone array orientaiton convergence . . . 208 4.43 Convergence statistics for numericaland experimentalsetups under
mo-tion/structure reconstruction. . . 209 4.44 Experimentalsound sources frequentialamplitude estimation . . . 210 4.45 Experimental sound source polar patterns with the `distance equalization
LIST OF FIGURES xiii
5.1 High-level hardware specication for an AISFM measurement unit . . . 217
C.1 Idealistic gray-level histogram of an image presenting bimodal distribution 227 C.2 Deployment of Bradleys' method on planar and spherical images . . . 229
D.1 Grid interpolation for a planar projection . . . 232
E.1 Kinematic scheme of spherical and planar projections . . . 233
E.2 MATLAB symbolic calculation for the computation of the SFM primary structure using spherical and planar projections . . . 234
G.1 Initial population for GA integer multi-objective optimization . . . 244
G.2 Crossover for GA integer multi-objective optimization . . . 245
G.3 Mutation for GA integer multi-objective optimization . . . 247
H.1 Continuous distribution of microphones on spherical microphone arrays . . 249
H.2 Discrete distribution of microphones on spherical microphone arrays . . . . 250
I.1 Inuence of SNR on SVL loss to false-negative detection versus SNR . . . 251
I.2 Eect of room reverberation on the localization method performance . . . 253
J.1 Eect of additive environmental noise on metrics estimation of a GCC-PHAT acoustic image . . . 256
J.2 Eect of room reverberation on metrics estimation of a GCC-PHAT acoustic image . . . 257
J.3 SLL on the background of an acoustic image with image sources . . . 257
K.1 Scattering scheme around a rigid sphere . . . 259
K.2 Polar pattern for an harmonic impinging wave of 60 dB across the largest cross-section of a sphere. . . 261
K.3 Amplitude error in microphones of a regular microphone array due to scat-tering . . . 262 K.4 Phase error in microphones of a regular microphone array due to scattering 263
LIST OF TABLES
2.1 Generalized means for acoustic image generation . . . 55 2.2 Generalization of the tridimensional reconstruction problem and minimum
conditions for over-determination . . . 70 3.1 Geometric criteria for a circular microphone array design . . . 91 3.2 Selected sets of microphone array distributions . . . 101 3.3 Localization performance of various microphone arrays and segmentation
methods from a collection of 4 randomly distributed sound sources of equal amplitude . . . 109 3.4 Numerical results for the localization performance of sound sources with
dierent levels . . . 111 3.5 Localization performance of limited-band sources for GCC-PHAT-GM
log-scale acoustic images and Bradleys' segmentation method . . . 115 3.6 Computed metrics for a monopole source . . . 120 3.7 Computed metrics of GCC-PHAT-GM acoustic images with a multi-objective
microphone array distribution and two monopoles with dierent frequency contents . . . 122 3.8 Computed metrics for a dipole source . . . 128 3.9 Physical parameters for the simulation of an extended source . . . 129 3.10 Computed metrics of acoustic images of dierent vibrating panels with
mi-crophone array Set 4a (D − Γ) . . . 134 3.11 Experimental conditions with two sources . . . 138 3.12 Experimental/numerical comparison of acoustic image metrics for a monopole150 3.13 Experimental results for localization performance of sources with dierent
levels . . . 159 4.1 Microphone arrays and sound sources distribution . . . 169 G.1 MATLABs' initial population, crossover and mutation functions for GA
optimization . . . 243
GLOSSARY
This is a glossary of terms.
Technical term Denition
Acoustic triplet Arrangement of three microphones for sound localization.
A-format Four signals from the microphone capsules of a sound eld
microphone array.
Aliasing Misidentication of a signal due to under-sampling.
B-format WXYZ corresponding to the rst order spherical harmonics
decomposition of the sound eld,obtained processing the A-format of a sound eld microphone array.
Epipolar line The line joining the epipole and structures' projection on
each one of the images on the stereo setup.
Epipolar plane The plane obtained from the sensor centers and the
structure point on stereo images.
Epipole (epipole line) Equivalent to the baseline of a pair of stereo images,the line joining the epipole points. The epipole point is the point of intersection of the baseline with the images on the
stereo setup.
Far-eld Far enough that the sound waves may be considered planar.
Free acoustic eld Physical condition where no reection occurs.
Intensity probe Sensor allowing measurement of the local sound intensity
I(r, t).
Microphone array A sensor composed of N ≥ 2 microphones for sound source
localization or sound eld reconstruction.
Motion In the 3D reconstruction theory,the motion is the sensor
localization,usually a six DOF variable.
Near-eld Close enough that the sound waves may be considered
spherical.
Nyquist frequency Twice the highest frequency of a signal and the frequency
necessary to perform sampling without introducing aliasing.
Pareto front Tradeo surface of dominant solutions for a certain group
of criteria.
Photogrammetry Extraction of 3D information from 2D information
using photography.
Pose Sensor position information (three DOF). Same as the three
rst DOFs of the motion information.
Segmentation Divide an image into segments with the same attributes.
Spatial aliasing Spatial multiplication of lobe and side-lobes due to signal
aliasing in high frequency. xvii
xviii GLOSSARY Technical term Denition
Structure In the 3D reconstruction theory, the structure is the reconstructed
object localization, usually a three DOF variable.
Outlier Observations much bigger or much smaller than other
LIST OF SYMBOLS
This is a list of representation conventions, variables, constants, mathematical operators and specic functions. Denitions directly exposed on the text have precedence over this list.
Representation conventions
x A scalar variable.
x A vector variable.
X A row or column matrix.
X A multidimensional matrix. Variables
α Absorption coecient or attenuation factor [non-dimensional].
β Compression (bulk) modulus [Pa] or reexion coecient [non-dimensional].
δ Increment.
Δ Numeric divergence [non-dimensional].
Δt Propagation time, time-of-ight or time-delay [s].
Δz Prediction error [m × m × m].
Error.
γ Empirical parameter for amount of reverberation [non-dimensional],
an element of matrix Γ or aperture angle [rad].
Γ Mean orientation geometric criteria [rad], de-correlation busting matrix
or parameter vector.
κ Spatial noise power [W].
λ Wave length [m] or reverberation power [W].
μ Poissons' ratio, mean (will be represented as M in the case of a matrix)
or relaxation factor.
ρ Spatial independent term of a joint probability density function or material
density [kg m−3].
η Damping coecient [non-dimensional] or number of parameters
[non-dimensional].
ω Angular frequency [rad s−1].
φ Phase [rad].
θ Elevation angle [rad].
ϕ Azimuthal angle or angular resolution [rad].
Π Power [W].
σ Scale parameter of a distribution, usually the standard deviation.
τ Time-lag between microphones' signals [s].
xx LIST OF SYMBOLS
Δθ, Δϕ Spherical grid size [rad].
r, θ, ϕ Triplet for spherical coordinates radius/elevation/azimuth [m × rad × rad].
a Radius [m].
a Acceleration [m s−2].
A Amplitude parameter of a fundamental source.
C Segmentation class.
d Euclidean distance [m].
D List of euclidean distances, a dataset, directional parameter of a fundamental
source or exural rigidity [Pa × m4].
e Epipolar line.
e Steering vector for a beamformer.
E List of directional vectors or Youngs' modulus [Pa] .
E Normalized beamformer response matrix (normalized acoustic image).
E Directional variance [rad].
f Frequency [s−1].
f Focus vector [m × m × m].
g Green function.
h Thickness [m], deepness threshold for reconstruction by erosion, room impulse
response (RIR) or adaptive lter.
H Transfer function.
I Intensity [W m−2].
J Exponential term of a joint probability density function.
k Wave number [m−1], number of carrier frequencies (samples), Otsus' optimal
threshold, number of iterations or cardinality [non-dimensional].
k Propagation or directional vector (unitary) [m × m × m].
K Number of reexions or radial elements on a polar grid [non-dimensional].
l Grid size [m].
L Dimension of a panel, of a microphone array (microphone array aperture/diameter)
or planar projection image plan [m]. Number of levels on a gray-scale image.
n Noise or pixels [non-dimensional].
N Number of point correspondences [non-dimensional].
p Pressure [Pa] or normalized probability distribution [non-dimensional].
P Number of pairs of microphones [non-dimensional].
P Sensor intrinsic matrix.
q Volumetric ow rate or source strength [m3 s−1].
q Structures' predicted projection [m × m × m].
r Coordinate vector: Cartesian with coordinates (x, y, z) [m × m × m] or spherical
with the coordinates (r, θ, φ) [m × rad × rad].
R Eciency [non-dimensional].
LIST OF SYMBOLS xxi
S Surface [m2] or consensus set.
t Time [s] or sensitivity of Bradleys' segmentation method [non-dimensional].
u Velocity [m s−1] or quaternion.
V Volume [m3].
w Masking coecient [non-dimensional].
w Displacement [m] or weighting matrix for a beamformer.
x A signal.
x Observed projection.
y A beamformer.
y Fitting parameters (aa, bs).
Z Modal impedance or specic acoustic impedance.
aa, bs Motion [m × m × m × rad × rad × rad] and structure information [m × m × m].
D, Γ Mean separation [m] and orientation dierence [rad] criteria.
D, E A set of linear separation [m] or orientation vectors [m × m × m]
of microphones pairs.
H, F Homography and fundamental matrix.
i, ii Index counters [non-dimensional].
m, n Index counters for the modal or multi-pole order [non-dimensional].
M, N Modal or multi-pole order [non-dimensional].
M, S Number of microphones/sensors and sources [non-dimensional].
o, s Index counters for observations and structure points [non-dimensional].
O, S Number of observations and structure points [non-dimensional].
R, T Rotation and translation matrix.
Ω, H Euclidean spaces.
x, y, z Triplet for Cartesian coordinates [m × m × m].
Constants
ρ0 Air density = 1.20 [kg m−3] at 20o C .
c0 Speed of sound in the air, usually equal to c0 = 344 m s−1 at 20oC. Relation
between c0 and temperature T en oC : c0 = (331, 5 + 0, 607T ) [m s−1].
I0 Reference intensity = 10−12 [W m−2].
j Complex number j =√−1.
p0 Reference pressure = 2 × 10−5 [Pa].
v0 Reference velocity = 5 × 10−8 [m s−1].
Z0 Specic acoustic impedance of the air = 412.8 [Pa s m−1] at 20oC.
xxii LIST OF SYMBOLS Mathematical operators ∀ For all ... ∈ Belongs to ... ∗ Complex conjugate. (· ) Normalized value. (· ) Estimated value.
· Mean quadratic value.
| · | Absolute value.
· Euclidean distance.
· Mean quadratic property.
[· ]−1 Inverse of a matrix.
[· ]T, ( · )T Transpose of a matrix or a vector.
[· ]H, ( · )H Hessian transpose. H Hessian matrix . I Identity matrix. J Jacobian matrix . Δ(· ) Laplace operator Δ = ∇ · ∇ . ∇( · ) Gradient.
hn(· ) Second kind spherical Hankel function.
jn(· ) Spherical Bessel function.
E(· ) Expectation.
IF T (· ) Inverse Fourier Transform (IFT): IF T (f(ω)) =−∞∞ f (ω)ejωtdω.
N( · ) Gaussian distribution.
P (· ) Probability.
P (· | · ) Conditional probability.
Pn(· ) Legendre polynomial of order n.
SSE(· ) Sum of squared errors.
P DF (· ) Probability density distribution.
F T (· ) Fourier Transform (FT): F T (f(x)) = −∞∞ f (x)e−jωtdt .
Specic functions
Φ A scalar function or normal modes shape function.
Ψ Weighting function in the frequency domain.
Σ Covariance matrix.
C Power cross-spectrum matrix.
O Computational complexity.
R Correlation matrix (or auto-correlation or cross-correlation) or matrix dimension.
R Erosion transformation.
W Precision matrix.
LIST OF ACRONYMS
This is a list of acronyms.
Acronym Denition
2D/3D Two dimensions/three dimensions
ABS Acrylonitrile Butadiene Styrene
AISFM Acoustic imaging structure from motion
AM Arithmetic mean
AOV Angle-of-view
ASFM Acoustic structure from motion
BA Bundle adjustment
CAS Computer algebra system
CRB Cramér-Rao lower bound
CLT Central limit theorem
Clean-SC Clean - source coherence
Clean-SC-MS Clean - source coherence - moving sources
Clean-MS Clean - moving sources
CNN Convolutional neural network
CS Consensus set
DNN Deep neural networks
CRB Cramér-Rao lower bound
CRC Canada Research Chair
DAMAS Deconvolution approach for the mapping of acoustic sources
DAMAS-MS Deconvolution approach for the mapping of acoustic sources
-moving sources
DAMAS-SC Deconvolution approach for the mapping of acoustic sources
-source coherence
DASB Delay-and-sum beamforming
dB Decibel
DLT Direct linear transform
DMA Dierential microphone array
DOA Direction of arrival
DOF Degree of freedom
DSP Digital signal processing
FEM Finite element method
FFT Fast Fourier transform
FIFO Fist-input-rst-output
FPGA Field-programmable gate array
GA Genetic algorithm
xxiv LIST OF ACRONYMS
Acronym Denition
GAUS Groupe d'acoustique de l'Universitéde Sherbrooke
GCC Generalized cross-correlation
GM Geometric mean
GPU Graphics processing unit
GSC Geometrically search grid
HD High-denition
HM Harmonic mean
HOA Higher order ambisonics
I2C Inter-integrated circuit
ICTD Inter channel time dierence
ICLD Inter channel level dierence
ICP Iterative closest point
IFFT Inverse fast Fourier transform
IIR Innite impulse response
IMU Inertial measurement unit
ISM Image source model
LTI Linear time invariant (system)
LM Levemberg-Marquardt method
MEMS Microelectromechanical systems
MLA Main-lobe area
MLE Maximum likelihood estimator
MLW Main-lobe width
MPDR Minimum power distorsionless response
MSS Minimal sample set
MVDR Minimum variance distorsionless response
NAH Near-eld acoustic holography
NGGA Non-dominated sorting genetic algorithm
NVH Noise, vibration & harshness
OMP-DAMAS Orthogonal matching pursuit - Deconvolution approach for the mapping of acoustic sources
PDF Probability density function
PDM Pulse density modulation
PHAT Phase alignment transformation
PSD Power spectral density
PSF Point spread function
PSO Particle swarm optimization
P-SRP Polar steered response power
RANSAC Random sample consensus
LIST OF ACRONYMS xxv
Acronym Denition
RMS Root mean square
SC-DAMAS Sparsity constraints - deconvolution approach for the mapping of acoustic sources
SBA Sparse bundle adjustment
SFM Structure from motion
SHB Sphericalharmonics beamforming
SLF Spatial likelihood function
SLL Side-lobe level
SNR Signal-to-noise ratio
SPD Symmetric positive semi-denite
SPL Sound pressure level
SRP Steered response power
SSE Sum of squared errors
STFT Short-time Fourier transform
SVD Singular value decomposition
SVL Source velocity level
TDOA Time dierence of arrival
THD Totalharmonic distortion
TOF Time of ight
ULA Uniform linear array
VPA Virtualphased arrays
CHAPTER 1
INTRODUCTION
1.1 General context
Sound source localization, classication and separation are important signal processing techniques in acoustics that have recently lead to important innovations in the civil, trans-portation and consumer goods industries.
For instance, sound source localization is a necessary prior step to noise control and abate-ment in the design process of quieter and more human-friendly machines, buildings, ve-hicles and other industrial applications. Abatement and control are only possible with a hierarchical classication of sources. In consumer goods, many noise reduction and speech enhancement systems rely on localization to improve overall intelligibility of hearing aid prostheses, videoconferencing systems and cellphones using dierential microphone arrays (DMAs) [Teutsch and Elko, 2001, 2004]. Separation plays an important role in the tech-nology industry and allowed the development of smart voice recognition devices based on natural language processing such as Google Home and Amazon Alexa.
These techniques have been academically studied as hierarchically dependent processes. Even though, thanks to the raise of machine learning and advanced psychoacoustics studies of the human hearing, new algorithms have been developed in the last decades in order to accomplish localization, classication, separation and enhancement in a more sophisticated and ecient manner, at the expense of particular hypotheses on the sound eld and source properties.
For instance, speech or music recordings from single or multi-channel acquisition systems may be fed into a convolutional neural network (CNN) (machine learning approach) [Abdel-Hamid et al., 2014] capable to extract patterns of the information and perform classication and separation with no need of localization. More recently, DNNs (deep neural network) have been used for source localization [Adavanne, 2017; Chakrabarty and Habets, 2017; He et al., 2019; Takeda and Komatani, 2016].
On the other hand, a pipeline of processing techniques based on human perception studies (machine hearing approach) [Faller, 2004; Lyon, 2017; Wang and Brown, 2006] can be used to emulate in a machine the way humans locate, classify and separate sound.
2 CHAPTER 1. INTRODUCTION While the machine learning method is based on an iterative black-box optimization model with weights adjusted from a large set of reliable training data with few pre-processing (usually the recordings are transformed to a time-frequency domain representation using short-time Fourier transform (STFT)), the machine hearing approach is built from lter-bank operators [Lyon, 1998] (that usually models the action of the human cochlea to separate sounds), feature extractors and may or may not include machine learning. Sound source localization, classication and separation techniques can work in synergy with high delity sound reproduction for reverse engineering and virtual reality appli-cations: for instance, assuming that sound will be reproduced using techniques such as the higher order ambisonics (HOA) and the wave eld synthesis (WFS), it needs to be captured, processed and reconstructed in three dimensions with methods from sound eld synthesis analysis [Lecomte, 2016]. Much work was developed in the eld of sound cap-turing for high delity sound reproduction, although one of the simplest solutions was developed by Gerzon in the 70s [Gerzon, 1975] and consists of a sound eld microphone array with four cardioid microphones converting the sound from A to B-format, a standard format in ambisonics corresponding to the rst order spherical harmonics decomposition of the sound eld [Faller and Kolundzija, 2009].
In industrial applications (particularly in the civil and mechanical engineering industries), which are the main focus on this thesis, more general hypotheses on the sound eld and source properties are necessary to achieve localization and classication of the so-called noise, while separation is usually not required. Localization and classication are hence the tools necessary to diagnose the generation, propagation and the acoustic impact of noise and plan the necessary actions, either internally modifying the source or the propagation medium, in order to enhance the acoustic quality of the product. This science is commonly known in the automotive industry, for instance, as noise, vibration & harshness (NVH). The most common diagnosis methods are [Padois et al., 2018]:
Noise assessment with sound level meters: the device is composed of a single
pres-sure microphone measuring the sound prespres-sure level (SPL)1 for a given point but
does not provide the localization of the source;
Source localization with sound intensity probes (or acoustic vector sensors): fol-lowing this method, both acoustic pressure and particle velocity are measured from
1. The SPL is computed with a logarithm decibel scale (dB) according to the following formula: SP L = 20 log10pRMS
p0 where pRMS is the measured pressure root mean square (RMS) value and p0 is a
1.1. GENERAL CONTEXT 3
a probe device (Figures 1.1(a) and (b))2. From these quantities, an acoustic
in-tensity vector can be computed, the sound propagation direction obtained and a source map constructed from the measurement, including the total acoustic power of the source, obtained by integrating the intensity vector on a surface enclosing the source. The technique requires multiple measurements in the near-eld, which in many cases are only feasible for small machines (Figure 1.1(c));
Source localization using a microphone array and near-eld acoustic holography (NAH): this method consists in reconstructing the acoustic far-eld or sources' ve-locity distribution (or any other surface) by solving, respectively, a propagation or a back-propagation problem from the measurement of the acoustic near-eld in the time-frequency domain. While the propagation is a straight-forward com-putational operation, the back-propagation is an ill-conditioned problem requiring regularization (to avoid noise amplication) and the inversion of a matrix as large as the number of microphones in the microphone array [Maynard et al., 1985].
Tra-ditionally regular3, planar and pressure microphone arrays have been deployed in
NAH. This requires a large number of nely distributed microphones (at distances smaller than the wavelength equivalent to the sought Nyquist frequency) and com-plex measurements setups to achieve the necessary frequency resolution. However, some recent progress has been achieved to reduce these limitations by exploring the sparsity of the mapping between the pressure and velocity elds, sub-Nyquist sampling and randomly distributed microphones [Chardon et al., 2012], spherical harmonics decomposition of the acquisition prior to back-propagation using spheri-cal microphone arrays [Jacobsen et al., 2011] or the deployment of particle velocity sensors [Jacobsen and Liu, 2005].
2. Intensity probes are systematically classied into two categories: the p − p type and the p − u type. The p − p type consists of a pair of phase-matched pressure microphones installed face-to-face a certain distance apart. This pair measures the scalar local sound pressure p(r, t) and the particle velocity. Three orthogonal pairs may be deployed in order to represent the full tridimensional vector. These variables directly give the user the local instantaneous sound intensity vector I(r, t) = pavg(r, t) · u(r, t), parallel
to the probe, with pavg being the average pressure between the pair of microphones. The p − u type is
composed of a microphone and a particle velocity sensor. The only velocity sensor available in the market at this day belongs to Microown Technologies. The sensor is composed of two heated laments which sense the particle velocity taking advantage of the anemometer principle [Reinder, 2008]. Other techniques are currently under development, such as the Soundskrit technology, which captures the velocity from the motion of a ber in the same way small insects sense sound [Göpfert and Robert, 2000; Miles and Zhou, 2017; Zhou and Miles, 2017, 2018].
3. Because propagation and back-propagation operators are dened in the wavenumber space. The space to wavenumber transformation is usually done with regular microphone arrays.
4 CHAPTER 1. INTRODUCTION
Source localization using a microphone array and beamforming4 : this method
consists in the set of algorithms used for the computation of the sound direction
from the time-delay5 between signals of two or more microphones. The delay may
be computed both in the time and frequency domains. Usually a beamformer func-tion is built from the summafunc-tion of the delayed signals at each microphone in the microphone array, considering that the delay is associated to a certain direction of arrival (DOA). The DOA for which the beamformer reaches a maximum is generally associated with the position of the source. This method presents major advantages compared to the previous techniques: rst, the acquisition may be performed either in the near or in the far-eld, depending on the choice of mapping between
time-delay and DOA6. Second, it requires fewer assumptions on the sound source (most
algorithms may be adapted to capture point or distributed, omnidirectional or di-rectional, stationary or dynamic and tonal or wide-band sources) and is suitable for sound source tracking. Moreover, advanced time-delay estimation methods may be deployed to compensate dicult radiation conditions such as uncorrelated/additive and correlated noise (or reverberation). Finally, because the beamformer function is generally computed for all possible DOAs, this method allows the characteriza-tion of the complete acoustic environment with one single capture. This generates a complete acoustic image of the scene known as acoustic image or source map. Just as the NAH methods, beamforming techniques may be deployed with planar or spherical microphone arrays with many types of microphone distribution, but also with linear microphone arrays. Their resolution and performance are generally directly related to number of microphones, microphones sampling and microphone array size [Chiariotti et al., 2019]. The computational performance decreases with the quantity of sensors deployed. The main drawback of traditional beamforming methods is generally the lack of 3D information (or range information between the source and the microphone array). As discussed above, one single capture is capable of generating an acoustic image of the complete scene. Due to the versatility and rich information obtained from this feature, beamforming methods were selected as the rst sound source localization technique to be exploited in this work.
4. Some microphone array techniques for source localization rely on other spatial cues such as the sound loudness dierence, also known as inter channel level dierence (ICLD). Beamforming techniques may take into account both cues, although time-delay based information has shown to be more robust and reliable for NVH and speech processing.
5. Homologous terms found in the literature for the time-delay between signals are inter channel time dierence (ICTD), time-of-ight (TOF) and time dierence of arrival (TDOA).
6. In the near-eld the mapping is done using a spherical wave model while in the far-eld the model used is planar.
1.1. GENERAL CONTEXT 5
Figure 1.1 (a) Diagram of a p−p probe. The soundpressure p(r, t) travels with
an angle θ to the probe axisa. (b) Picture of a Microown p−u probe (obtained
from [Reinder, 2008]). A pressure microphone still needs to be presentb . (c)
The intensity eld of a loudspeaker radiating between 1780 Hz and 1875 Hz is assessedusing a three-dimensional p − u probe from Microown Technologies. Warmer colors: higher intensity eld. In this case, the streamlines are plotted using the Scan & Paint technique, which tracks the probe position using an external vision system (picture obtainedfrom [Fernandez et al., 2015]).
a. Phase matching of microphones is crucial for particle velocity sensing, which is obtained from the pressure gradient between the two microphone signals.
b. u(r, t) gives an indication of sound directionality but, because particles move symmetrically in both directions from the sources, is not sucient to determine if sound is coming in or out. Multiplication of particle velocity by the average pressure overcomes this issue.
As in Figure 1.2, major advancements in sensor fusion were achieved in recent years in order to overlay an acoustic image to the images obtained from digital cameras [Padois et al., 2018]. Also, major implementation problems for applications requiring quasi real-time processing were solved: algorithms have been implemented on high performance, object-orientedprogramming languages written on eld-programmable gate arrays (FPGA) or GPUs andintegratedto microphone arrays with a large quantity of digital microelectrome-chanical systems (MEMS) sensors [Silva et al., 2017, 2018; Vanwynsberghe et al., 2015]. Diverse digital signal processing (DSP) problems [Izquierdo et al., 2018; Perrodin et al., 2012] have been solvedin this scenario using specic beamforming algorithms such as the delay-and-sum beamforming (DASB) [Zimmermann and Studer, 2010], spherical harmon-ics beamforming (SHB) andthe generalizedcross-correlation (GCC) [Das andPriyanka, 2016; Fréchette-Viens et al., 2020; Grondin and Michaud, 2018].
Much resources were spent on the geometric conception of microphone arrays. Generally speaking the array shape, size andmicrophones distribution have an extreme importance on the system performance: planar arrays are optimized for directional noise detection while spherical ones are suitable to environmental noise assessment. Because the
wave-6 CHAPTER 1. INTRODUCTION
Figure 1.2 (a) The VisiSonics 5/64 Audio Visual Camera is composed of 5 HD cameras and 64 omnidirectional electret microphones. A panoramic view is generated stitching the image from each camera and connected to the acoustic image using a GPU (adapted from O'Donovan [O'Donovan, 2013]). (b) B&K of-fers a similar solution with 36 microphones. Both solutions run similar spherical harmonics beamforming (SHB) algorithms. The former oers a higher resolu-tion and dynamic range (via the decomposiresolu-tion of the sound eld to spherical
harmonics of 7th degree). These solutions are deployed on noise assessment in
vehicle cabins, buildings and industrial plants. (c) The B&K wheel array is a 90 channels planar array with one digital camera for NVH applications. It runs the traditional delay-and-sum beamforming (DASB) (pictures (b) and (c) obtained from B&K[Kjær, 2016]).
length is inversely proportional to frequency, the system lowest frequency resolution is limited by the array size. The distances between and orientations of microphones also play, in conjunction with the beamforming algorithm used, a major role on the quality of the overall acoustic image: depending on those parameters the beamformer function may develop local maxima zones (also known as side-lobes) aside the main maximum (known as the main-lobe) and lead to detection of false-positives or false-negatives sources (Figure 1.3).
Other than the overall source detection accuracy rate, a few metrics are available in or-der to quantitatively characterize an acoustic image: for instance, the main-lobe width (MLW) and the side-lobe level (SLL) [Christensen and Hald, 2004] and the covariance ellipses related to the attenuation of the main-lobe [Padois et al., 2018]. Some empirical works have suggested that, for planar arrays, multi-arm spirals or randomly distributed microphones arrays have lower SLL (hence better dynamic range) compared to their reg-ularly distributed versions [Hald, 2002; Hald and Christensen, 2002]. These conclusions have been also veried in the case of very large planar arrays used in yover noise
8 CHAPTER 1. INTRODUCTION moving sources with the delay-and-sum beamforming (DASB) using directly using image metrics as tness functions for the optimization.
Post-processing may be used for further improvement of the overall acoustic image quality. For instance, deconvolution methods are suitable to be deployed with microphone array measurements and have been used in the last decades for this purpose. Also, deconvolution has been successfully used with optical images for a long time: several deconvolution methods were used to correct severe spherical aberration on the early images obtained from a awed mirror installed in the Hubble space telescope [Allen et al., 1991].
Deconvolution methods are iterative algorithms which reverse the eects of convolution between a true acquired signal with undesired signals (generated from measurement er-rors, awed hardware, additive noise, reverberation etc.) resulting in a distorted acquired information. In the case of microphone arrays these undesired signals are also related to the nite size of the array and the discrete microphone sampling. The distorted acquired signal can be modeled as a convolution of the source signal with a point spread function (PSF). An iterative numerical method for deconvolution can be schemed from those two relationships. In acoustic imaging, the PSF may be estimated from the microphone distri-bution with assumptions on the nature, distridistri-bution and radiation of the source, usually made into a frequency-domain beamforming computation.
More precisely, deconvolution is very popular in complex aeroacoustics imaging measure-ments: the deconvolution approach for the mapping of acoustic sources (DAMAS) [Brooks and Humphreys, 2006a] is a Gauss-Seidel elimination method with Tikhonov regulariza-tion. An adaptation of the traditional Clean algorithm [Hogbom, 1974] used in astronomy, the source coherence Clean-SC algorithm was schemed by Sijtsma [Sijtsma, 2007] to sup-press spatially coherent side-lobes generated from coherent noise sources. The same as-sumption may be used with DAMAS (DAMAS-C) [Brooks and Humphreys, 2006b] while the Doppler eect correction can be introduced into the PSF to x images obtained from moving sources without (DAMAS-MS, Clean-MS) and with (Clean-SC-MS) source coher-ence assumption [Fleury and Bulté, 2011].
Progress was achieved exploiting solution regularization using sparsity constraints (based on the sparse spatial distribution of sources) on the deconvolution solution of DAMAS (SC-DAMAS) [Yardibi et al., 2008], orthogonal matching pursuit (OMP-(SC-DAMAS) [Padois and Berry, 2014] or hybrid methods [Padois et al., 2014]. However, deconvolution techniques remain an optional post-processing solution to obtain limited enhancement of acoustic
1.1. GENERAL CONTEXT 9 images at the expense of seconds to days of intense computational resources [Padois and Berry, 2017].
From a dierent perspective, sensor fusion7 is an emerging method deployed in source
localization and acoustic imaging. As cited above, overlaying an acoustic image with the image obtained from digital cameras is one application example of sensor fusion. Sensor fusion full eectiveness may be leveraged deploying distributed sensing networks.
In microphone array processing, several drawbacks may be stated when comparing a single microphone array to a distributed network of several microphone arrays sharing source lo-calization information: limited spatial coverage (related to the attenuation of the acoustic signal over the propagation medium), imprecision (maximum localization precision limited by the microphone array precision) and uncertainty (related to the poor accessibility of the array to hidden sources). Deployment of a distributed network of sensors may bring benets like: robustness (multiple measurements are performed even in the case of partial failure), extended spatial coverage (related to a better spatial amplication and access to sources), increased condence and resolution (localization errors average out and accuracy increases with the size of the network), reduced ambiguity and uncertainty (obtained from cross-check and correlations between measurements).
The intuitive disadvantage of distributed networks is practical and economical: multiple array acquisition is expensive and hard to deploy. To solve these disadvantages, Come-saña [ComeCome-saña et al., 2014] introduced a measurement technique called virtual phased ar-rays (VPA) which consists of a virtual and spatially scalable microphone array constructed with a single mobile sensor and used for beamforming processing of time stationary sound elds and noise. Nonetheless, the VPA system may not be considered distributed. Dis-tribution in the eld of microphone array processing may be achieved by deploying the principle of triangulation.
Using triangulation, Tiete [Tiete et al., 2014] designed a microphone array network com-posed of several circular microphone arrays equipped with an inertial measurement unit (IMU) for the localization of noise pollution in urban areas: using beamforming in the frequency domain, each network node outputs a 2D polar steered response power (P-SRP) (equivalent to an acoustic image but in polar coordinates) pointing to the directions of the
7. For denition purposes, Elmenreich [Elmenreich, 2001] classies sensor fusion as the combination of sensory data or data derived from sensory data such that the resulting information is in some sense better than would be possible when these sources were used individually. In a broader sense, Hall [Hall and McMullen, 2004] denes data fusion: the technology concerned with the combination of how to combine data from multiple (and possible diverse) sensors in order to make inferences about a physical event, activity, or situation.
10 CHAPTER 1. INTRODUCTION located sources. If each array is considered a node in a 2D scene environment, all P-SRP(s) can be superimposed in order to form an enhanced 2D acoustic image with the pinpoint localization of the noise sources (Figure 1.4). Similar principles have been used in the development of counter-sniper [Sallai et al., 2011] and wild-life monitoring systems [Blum-stein et al., 2011]. For telecommunications applications, Brutti [Brutti et al., 2005, 2006] proposed a smart room equipped with distributed microphone arrays to extract talker localization and head orientation. While the localization is achieved with triangulation of a beamforming technique similar to the GCC, the head orientation is obtained jointly using the localization information and modeling the head radiation directional pattern as a Gaussian distribution. Cobos [Cobos et al., 2017] reviews multiple distributed sens-ing methods for sound source localization (includsens-ing the source localization methods and distributed sensing architectures), the challenges and frontiers in that eld.
One problemwith distributed arrays is that part of the nodes may present poor accessibil-ity to the sources, resulting in information that may, if in large presence, both slow-down and turn unreliable the network. To solve this problem, Aarabi [Aarabi, 2003] proposed a fusion method that takes in account a priori information on the sources localization
through Bayesian inference8, reducing by a factor of 10 the source localization error of
a distributed network. However, such a method requires the estimation of the a priori information, which depends on the propagation path between every potential source to the array. In practice, this estimation is laborious.
On the other hand, triangulation is a mastered technique in computer vision: 3D re-construction from multiple images is largely used in the game and movie industry, in
geolocalization and in technologies such as 3D scanners9, tracking systems and
photogram-metry [Förstner and Wrobel, 2016]. In order to obtain 3D positions from 2D images, an optimization technique called structure from motion or bundle adjustment (SFM or BA) is deployed [Hartley and Zisserman, 2004; Triggs et al., 2000]: this requires a least-squares minimization of an objective function constructed from the re-projection errors of a 3D structure. The objective function takes into account extrinsics (position and orientation - also called motion variables) and intrinsics (calibration variables such as the focus and
8. The Bayes theorem [Bayes, 1763] states that the a posteriori (conditional) probability of hypothesis H given that the hypothesis E has occurred (P (H|E)) depends on the a posteriori probability P (E|H) and on the a priori probabilities P (E) and P (H): P (H|E) = P (E|H)P (H)P (E) .
9. The common techniques used in 3D scanners are: laser based TOF scanners (lidar) and stereoscopic scanners based on the triangulation of laser points/lines or structured light patterns. While lidar systems are capable of medium to long-range scanning (up to kilometers) with a precision on the order of millime-ters, stereoscopic systems are used for short-range scanning (up to a meter) at a higher precision (on the order of tens of a micrometer).
1.1. GENERAL CONTEXT 11
Figure 1.4 (a) A normalized P-SPR obtained from one circular array node on a distributed sensing network. (b) The P-SPR may be projected on a 2D plane to generate a acoustic image of the network node. (c) The superimposition of the sources maps for each node in the network may improve the estimated position
of the real sourcea (picture adapted from Tiete [Tiete et al., 2014]).
a. Probability ranges from 0 to 1, as a result of the normalized P-SPR.
distortion coecients in the case of digital cameras) parameters of the sensor and mea-surement uncertainties to obtain an optimal reconstruction solution. It also requires the solution of the image correspondence problem between multiple images [Hartley and Zis-serman, 2004]: the extracted 3D features obtained in multiple images need to be somehow corresponded/associated. Many open-source SFM libraries are easily available for the research community (for instance, the sparse bundle adjustment (SBA) package written in C + + by Lourakis [Lourakis and Argyros, 2009]) and may be readily used on the development of new applications.
12 CHAPTER 1. INTRODUCTION Deployment of 3D vision apparatus in acoustic imaging is not new: VPA [Comesaña et al., 2014] and Scan & Paint [Fernandez et al., 2015] techniques use trackingsystems to monitor sensor positions. B&K [Frazer, 2014] used a similar system to calibrate microphone posi-tion on microphone arrays for beamformingand NAH. Lidar scanners have been used for environmental scanningand denition of room geometry and acoustics jointly with noise localization usingmicrophone arrays [Geosystems and Construction, 2010; Jaruszewska, 2012].
Because acoustic images present similar projective image generation characteristics to images obtained from optical cameras, images overlay is possible [Padois et al., 2018]. Nonetheless, not many researchers have proposed 3D reconstruction methods in the di-rection proposed by computer vision algorithms for sound source localization, such as the SFM, usingacoustic images and their sound localization information.
Similar methods have been used for 3D reconstruction and motion localization
usingmul-tiple forward-lookingsonar images10 in order to assist navigation of autonomous
under-water vehicles: Huang[Huang, 2016; Huangand Kaess, 2015, 2016] proposed the acoustic structure from motion (ASFM) method in this sense. Nonetheless, feature extraction, clas-sication and correspondence between multiple images continues to be, both in computer vision and acoustics, the main challenge on the deployment of the SFM method.
Some researchers have investigated less complex tridimensional methods in acoustic imag-ing: in order to obtain range information of monopoles using single arrays, Ding [Ding et al., 2018] proposed a frequency-domain beamformingtechnique with time-delay com-puted usinga spherical wave model. For a source located in a given direction, it performs a scan over dierent projection planes in order to locate the maximum beamformer output. The position of the projection plane where this maximum lies corresponds to the range of the source. Koutny [Koutny et al., 2017] suggested a similar approach using beamforming with the spherical harmonic decomposition of the sound eld and expressingthe range as a low-frequency approximation of this decomposition. The accuracy of these methods is although limited, among other factors, by the near to the far-eld transitional behavior of the source radiation. Takingadvantage of this phenomenon, several researchers have proposed deconvolution approaches that sweep a beamformer alongthe near-eld zone in order to reconstruct aeroacoustic sources tridimensionally: Xenaki [Xenaki et al., 2012] revealed the precursor approaches usinga single planar array while others [Battista et al.,
10. Sonar principle is very similar to beamforming, except for being an active system: the sonar system is an array of sensors which monitors known ultra-sound signals emitted by well-located sources. The delays of the signals, obtained from the reection caused by obstacles in the medium, are used to beamformer the obstacles and extract its distance and azimuth localization.