Sparse coding for efficient bioacoustic data mining: Preliminary application to analysis of whale songs

(1)

HAL Id: hal-02993300

https://hal-univ-tln.archives-ouvertes.fr/hal-02993300

Submitted on 6 Nov 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Sparse coding for eﬀicient bioacoustic data mining:

Preliminary application to analysis of whale songs

Joseph Razik, Hervé Glotin, Maia Hoeberechts, Yann Doh, Sébastien Paris

To cite this version:

Joseph Razik, Hervé Glotin, Maia Hoeberechts, Yann Doh, Sébastien Paris. Sparse coding for eﬀicient

bioacoustic data mining: Preliminary application to analysis of whale songs. International Conference

on Data Mining Workshops, 2015, Atlantic city, United States. �hal-02993300�

(2)

Sparse coding for efficient bioacoustic data mining:

Preliminary application to analysis of whale songs

Joseph Razik∗, Herv´e Glotin∗†, Maia Hoeberechts‡, Yann Doh∗ and S´ebastien Paris∗

∗_{Aix-Marseille Universit´e, 13397 Marseille, ENSAM, France}

Universit´e de Toulon, 83957 La Garde, France UMR CNRS LSIS 7296, Equipe DYNI

†_{Institut universitaire de France, 75005 Paris, France} ‡_{Ocean Networks Canada, University of Victoria, BC, Canada}

Abstract—Bioacoustic monitoring, such as surveys of animal populations and migration, needs efficient data mining methods to extract information from large datasets covering multi-year and multi-location recordings. Usually, the study of the hump-back whake songs is based on the classification of sound units, notably to extract the song theme of the singers, which might signify the geographic origin and the year of the song. Most of these analyses are currently done with expert intervention, but the volume of recordings drive the need for automated methods for sound unit classification.

This paper introduces a method for sparse coding of bioacous-tic recordings in order to efficiently compress and automabioacous-tically extract patterns in data. Moreover, this paper proposes that sparse coding of the song at different time scales supports the distinction of stable song components versus those which evolve year to year. It is shown that shorter codes are more stable, occurring with similar frequency across two consecutive years, while the occurrence of longer units varies across years as expected based on the prior manual analysis. We conclude by exploring further possibilities of the application of this method for biopopulation analysis.

I. INTRODUCTION

Different kinds of vocalizations (moans and screams) are emitted by Humpback whales [1], reported as songs by Roger and Katy Payne [2], [3]. These songs are predominant in the breeding zone but have also been recorded during migration and occasionally in the feeding area [4], [5].

These sounds are emitted by male individuals [6]. Different hypotheses about these songs is that they possibly play a role in female attraction [7], [8], [9], [10] and/or for strong interac-tions between males as territorial defense or challenges [11], [12]. Noad et al. [13] highlighted song copying between males from the Australian East coast and those from the Australian West coast.

Songs are cyclic and composed of a structured and contin-uous sequence of sounds that can be repeated several times without interruption. These short continuous sounds between 2 silences are called sound units [14]. The complex structure of these songs are based on successive specific sound units forming a sequence, several sequences forming a phrase, and several phrases forming a theme-song [14].

Current challenging objectives in the analysis of humpback whale songs include:

1) detection of the different kinds of vocalizations;

2) automatic classification of the sound units; 3) extraction of phrases of the songs;

4) localization of individuals and characterization of inter-action of the singers.

Objectives 1. and 2. are particularly challenging because:

• there is a large diversity of the sound units (moans,

growls, sets of pulses, cries and trumpet sounds). Sound units’ features vary both in the time domain and in the frequency domain. The main frequency is from 100 Hz up to 20 kHz and the source levels could be more than 170 dB re 1 u Pa at 1 m (see examples in Fig. 1);

• singers emit sounds simultaneously; • varied underwater ambient noise is present.

For objectives 1. and 2., researchers proposed new ap-proaches based on temporal and spectral features [15]. Some variations in these features were reported and could contain part of the information needed for detection and classifica-tion [16].

Methods used to analyze human speech have been applied to Humpback whale calls since they present several simi-larities including the presence of voiced and unvoiced type vocalizations, as defined in Mercado and Kuh[17]. Humpback whale calls have been analyzed using linear prediction coding (LPC) [17], energy content in specific time windows [18], spectrographic analysis [15], Mel-Frequency Cepstral Coef-ficients (MFCCs) [19], [20], [21], [22], affinity propaga-tion [22], K-means [23], and classified with self-organizing maps (SOM) [17], [15], Hidden Markov Models (HMM) [24], the sliding window match length (SWML) entropy estima-tion [15] and neural networks. The great variety of methods used by researchers to analyze Humpback whale vocalizations reflects the great diversity of the features of these sounds [25]. The main drawback of most of these methods is that the number of sound units seems unknown, maybe unlimited, because songs are changing during the season, from one year to another and in the different breeding areas [26], [27], [28]. To better analyze these songs and to improve performance of classification, we recently introduced the new concept of subunit [22], [29], [24]. We suggest that one or more than one subunits are present in one sound unit. The interest of this approach is to show that a number of subunits could be used

(3)

Fig. 1. Spectrogram of Madagascar song segments (from [24]). Corpus name Sampling info Duration Madagascar 2008 16 kHz, 16 bits 03:02:19.54 Madagascar 2009 44.1 kHz, 16 bits 04:38:24.27

TABLE I

DURATION OF AVAILABLE DATA ACCORDING TO DATASET.

for characterizing the sound units, meaning that a sound unit could be built from a combination of these subunits.

This paper proposes for the first time a fully automatic sparse coding of humpback whale songs, to determine their stable components versus their evolving ones, at different time scales. We also propose a definition of code complexity which separates the song components from the background sea noise. We then explore the method’s applicability to analyze the relative contribution sound units vs. subunits to song decomposition and evolution.

II. MATERIALS AND METHODS

A. Humpback whale recordings

For this paper we focus on recordings of humpback whales from Sainte Marie Channel (Madagascar). The recordings vary in length from several years to months or days, as they were collected by different collaborators (Megaptera, LAM, Cesigma). The datasets used were collected in 2008 and 2009. The frequency sampling is between 16 kHz and 44.1 kHz with 16 bit sample encoding. Fig. 1 shows a spectrogram of a typical sample of a Madagascar song from our corpus. The duration for the different recording sites is presented in Tab. I.

The hydrophone used for the recordings is a ColmarItalia GP280 (omni-directional, [5Hz, 90kHz], sensitivity -170 dB

re 1 V/u Pa, see datasheet on www.colmaritalia.it). The hydrophone was deployed from a motor boat (motor off), positioned≈100 m in front of the singers at depth 20 m (the water column depth was between 40 m and 50 m).

In order to normalize the data according to other recording parameters, we have down sampled all the sound files to 16 kHz sampling frequency, 16 bits.

B. Cepstral representation

The first step in the analysis is to characterize the recorded songs by Mel-Frequency Cepstral Coefficients (MFCC) [30], [31], [23]. The use of the cepstral scale is motivated by the fact that mammals perceive frequency on a logarithmic scale along the cochlea [32], [33]. In our approach, we apply a method developed for human speech analysis to humpback whale vocalizations [29]. Rather than directly duplicating the method, we demonstrate its application for analyzing the harmonic parts of each sound unit or subunit of whale songs. We then build codebooks at different time scales and explore their properties in characterizing song evolution.

We compute the 12 first static Mel-Frequency Cepstral Co-efficients (MFCC),M1, M2, . . . , M12. To these 12 coefficients

M1, M2, . . . , M12 we add a M0 coefficient that captures the

energy of the signal, thus yielding 13-dimensional MFCC vectors. These coefficients are computed with a 512 point Fast Fourier Transform (32 ms), with a window length of 250 ms and a frameshift of 10 ms.

On these resulting vectors, Cepstral Mean Subtraction (CMS) and variance normalization were applied. The extrac-tion of these parameters is done with the SPro toolkit [34].

As songs’ patterns are longer than a 10 ms scale, we form vectors by concatenating MFCC vectors. These super-vectors form the dictionary words in the sparse analysis (see Section II-C). We consider words of length 250 ms, 500 ms, 1 s, 2 s, and 4 s, which are formed by concatenating 25, 50, 100, 200 and 400 MFCC vectors respectively. In order to be sure to capture sound units, the vectors are concatenated with a 50% overlap; for example, the MFCC vectors we manipulate for 500 ms scale are 650-dimensional vectors (13 × 50 component), one every 250 ms.

C. Dictionary and sparse coding

In order to support efficient bioacoustic data mining, the large MFCC vectors are encoded by a learned dictionary and a sparse code. The sparse code identifies how dictionary words are recombined to produce a reconstructed representation of the original vectors. This dictionary was learned on the union of the MFCC representations of the original humpback whale song datasets. In this section we explain the details of the sparse coding method.

In order to obtain a global robust representation of the signal X_{= [x}1, . . . , xN] ∈ Rn×N, each MFCC vector xi (n = 650

in the case of 500 ms) are first linearly encoded as the vector ci∈ Rk such that xi≈ Dciwhere D, [d1, . . . , dk] ∈ Rn×k

is a preliminary trained dictionary with the constraintkdjk2=

1. In a first attempt to solve this linear problem, cican be the

solution of the Ordinary Least Square (OLS) problem: lOLS(ci|xi; D), min c i∈Rk 1 2kxi− Dcik 2 2 (1) OLS formulation can be extended to include a regularization term avoiding data overfitting. Thus, we obtain the ridge regression (RID) formulation:

(4)

lRID(ci|xi; D), min c_i∈_Rk 1 2kxi− Dcik 2 2+ βkcik 2 2 (2) This problem can be analytically solved and then ci =

(DT_D_{+ βI}

k)−1DTxi. In order to decrease reconstruction

error and to have a sparse solution, this problem can then be reformulated as a constrained Quadratic Problem (QP):

lQP(ci|xi; D), min c i∈Rk 1 2kxi− Dcik 2 2 s.t. kcik1= 1 (3)

To solve this problem, we can use a QP solver involving high combinational computation to find the solution. Under the RIP assumption [35], a greedy approach can be used efficiently to solve 3. Finally, the sparse code (SC) is defined by:

lSC(ci|xi; D), min c_i∈_Rk 1 2kxi− Dcik 2 2+ λkcik1, (4)

where λ is a regularization parameter which controls the level of sparsity of the sparse code ci. This problem is also

known as basis pursuit [36] or Lasso [35] problem. To solve this problem, we can use the popular Least Angle Regression (LARS) algorithm.

The dual part of the training of the dictionary D and the computation of the projection sparse codes C= [ci, . . . , cN]

supports the reconstruction of the MFCC vectors. For an MFCC vector x ∈ Rn _{and the associated sparse code vector}

c _{∈ R}k_{, the reconstructed MFCC vector} _{x is the linear}_ˆ

combination of the dictionary codebook vector di according

to ci values of the sparse code c. More formally, x is givenˆ

by the following equation: ˆ x_{= D.c =} k X i=1 d_i_.c_i ₍₅₎ D. Dictionary size

An important aspect of the encoding is the choice of an appropriate size for the dictionary. The goal of sparse coding is to create an encoding of the larger dataset which maintains structure in the data to facilitate analysis, but at the same time reduces the size of the encoding to permit efficient computation.

One drawback of sparse coding is that the size of the dictionary has to be fixed manually and this size should not over complete the expected number of classes after clustering. In this experiment, we learned three dictionaries withK = 16, K = 32 and K = 1024 words respectively. The K = 32 dictionary is used for the analyses in presented in III, as it was empirically determined to support the best discriminative representation of the full vectors.

E. Relationship between MFCC signal and sparse code

In this section we present an approach to verifying that a given sparse code is representative of its original set of MFCC vectors. Each MFCC vector corresponds one sparse

0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 −0.45 −0.30 −0.15 0.00 0.15 0.30 0.45 0.60 0.75 0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350 0.00 0.15 0.30 0.45 0.60 0.75 0.90

Fig. 2. Representation of the auto-correlation matrix between MFCC vectors (a), and sparse code vectors (b). Results obtained on the 4 s encoding of the corpus with a dictionary of 1024 vectors.

code vector. We expect that patterns appearing with MFCC auto-correlation to still appear with the sparse code vector auto-correlation. In Fig. 2, we present an example for the dictionary with K = 1024 words. The figure shows the 2009 recording auto-correlation with MFCC vectors on the top and sparse codes vectors on the bottom over a subset of 400 randomly chosen samples. We can note that if there is information in the MFCC space, this information also appears in the sparse code space. This is seen in the figure as the corresponding structure in the auto-correlations.

F. Codebook complexity estimation

The estimation of complexity of time-frequency plane can include moment-based measures such as time-bandwidth prod-uct and the Shannon and Renyi entropies [37]. In order to ana-lyze the dictionary we generate, we extend the time-frequency complexity definition to cepstral pattern complexity, based on the principle that a concentration of energy in the time-frequency plane will also generate an energy concentration in the cepstral plane. We investigate a quantitative measure of

(5)

-0.1 -0.05 0 0.05 0.1

Fig. 3. The codebook composed of 32 codes, sorted by degree of complexity (from top left to bottom right), computed at the time scale of 250 ms, learned from the union of 2008 and 2009 song sets.

complexity inspired from existing work [37]. This measure is closely related to the assumption that signals of high complexity (and therefore high information content) must be constructed from a large numbers of elementary components. We thus define the complexity measure of the sparse vectors d_i _{of the dictionary D as the Shannon entropy:}

H(di) = −

X

t,j

p(di(t, Mj)). log(p(di(t, Mj))), (6)

wherep(di(t, Mj)) is the estimate of the energy distribution

at time t for the cepstral coefficient Mj. The codebook for

dictionary size K = 32, time scale 250 ms, sorted by their complexity measure is shown in Fig. 3.

G. Divergence measure of the codes

Information Theoretic methods support the analysis of struc-ture and the organization within a communication system [38]. As our goal is to analyze differences between whale songs (communication system) over different years, we propose to use an information theoretic measure to estimate the song divergence.

In order to get a diachronic analysis, i.e. to determine which code is more or less used from one year to one other, we compute the Kullback-Liebler distance [39] over song components as represented in the sparse coding. We interpret a difference in the average of the Kullback-Leibler distance for a song encoding subset between 2008 and 2009 recordings as an evolution of the song, assuming that higher this distance is, the more the songs evolved from one year to the other.

Therefore, the song distance is defined as follows. Let be Adi (resp. Bdi) be the discrete probability distribution over

R bins r = {1, . . . , R}, of the 2008 C sparse codes for the sparse vector di (resp. 2009). Then the distance for the sparse

vector di is: 2.8 3 3.2 3.4 3.6 3.8 4 5 10 15 20 25 30 complexity in bit codes

Fig. 4. Complexity values of the 32 codes (sorted) of the codebook illustrated previously (time scale 250 ms learned on 2008 union 2009). The difference between highest and lowest complex codes is significant.

distKL(Adi, Bdi) = R X r=1 (Adri− Bd r i). log2(Ad r i/Bd r i) (7)

Finally, the final song distance is the average of thedistKL

of target code subsets.

III. RESULTS

The results in this section are computed over a sparse code and correspondingK = 32 word dictionary. As explained in II-B, the MFCC input vectors are composed of concatenations of 13 coefficient vectors to form super-vectors with length of 250 ms, 500 ms, 1 s, 2 s to 4 s.

A. Analysis of song code complexity

Fig. 4 shows a graph of the sorted complexity (as defined in Section II-F) of the quefrequency words of the learned dic-tionary. We hypothesize an interpretation of the complex code words, which exhibit energy variation in time and frequency, as encodings of whale song subunits and of the less complex words, which exhibit more uniformity, as components of sea noise.

The next section will compare the evolution between 2008 and 2009 of the songs with reference to the sparse coding.

B. Analysis of song evolution

Using the formula given in Section II-G, we compute the Kullback-Liebler (KL) distance between the 2008 and 2009 code words, resulting in 32 distances for each set of super-vectors of increasing length (250 ms, 500 ms, 1 s, 2 s, 4 s). The histogram of the distribution of these distances grouped into bins of average distance is shown in Fig. 5. It can be seen from the distribution that short-duration (250 ms) representations are more stable across years than longer ones.

In order to determine whether the evolution can be attributed to changes in the more or less complex code words, we compute the KL distance between 2008 and 2009 for the 2

(6)

0 5 10 15 20 25 30 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 number of word distanceKL(2008,2009) 0.25s 0.5s 1s 2s 4s

Fig. 5. Histogram of the KL distance (2008, 2009) computed over the 32 sparse vectors of the dictionary.

0.02 0.03 0.04 0.05 0.06 0.07 0.25 0.5 1 2 4 dKL(2008||2009)

time scale (sec.)

2 most complex codebooks 2 simplest codebooks

Fig. 6. Distance between 2008 and 2009 songs, in average over the two most complex codes, versus the two simplest ones. In abscissa the time scale of the code, from 250 ms to 4 s.

most complex codes, versus the 2 least complex code words, for all lengths. The divergence analysis (Fig. 6) illustrates that the simplest code words show far less variation across years than the most complex code words. Furthermore, the complex code words are similar for short durations (the KL distance is low for 250 ms), but differ at longer durations with the largest variability at the time scale of 1 s.

We can interpret this result to means that the code of 1 s scale are vary year by year, and may be composed of stable codes on a shorter time scale which exhibit less variation. This result is compatible with the subunit concept [23], which postulates that evolving whale songs are composed from shorter stable song elements. Our results suggest that the subunit could be coded at the 250 ms time scale, while the units would be coded at the 1 second scale. The longer time scale (2 and 4 seconds) are less diverging, possibly due to the fact that this time scale is relative to global song structure that may vary less than the unit level.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.5 1 1.5 2 2.5 word coefficient time in second word1 word2 word3

Fig. 7. Contributions of four code words to the encoding of a 2008 song segment for the dictionary size K = 16.

Note that the distance computation directly on the raw MFCC yields, as expected, to insignificant difference in dis-tance, whatever the size or the year of the units. This effect is known as the “dimensionality curse effect” which makes the KL-distance metric inefficient in high-dimensional spaces. In-deed, the dimensionality of the MFCC vectors is 650 (13x50), and according to [40], any simple distance computation of any pair of vectors in such a high dimensional space results will result in a similar distance.

IV. DISCUSSION

We presented an unsupervised dictionary learning algorithm for generating a proto-lexicon of the songs of Humpback whales at different time scales. These dictionaries are used as the basis for a sparse encoding of the original datasets. These representations are more generic and efficient than obtained in our previous method [23], [41].

We show in this paper the utility of a sparse representation of complex bioacoustic patterns for efficient data mining. We presented the hypothesis that the long and short time duration sequences might represent varying sound units and stable subunits of whale songs, respectively. In order to support this hypothesis, in future work we intend to further analyze the structure of the representation of the sparse coding with respect to the original acoustic dataset. Two ideas for this analysis are presented here, computed on a dictionary withK = 16.

Fig. 7 shows the relative contributions of code words to the encoding of a song segment from the 2008 dataset. We see clearly the variation of the code word activity. For example at 1.5 s, words 2 and 3 contribute simultaneously to generate a complex pattern, whereas at 2 s, word 1 is more active than any other word. This type of analysis is suggested as a means of characterizing the song structures.

To determine whether the code words actually represent meaningful sound subunits, we could consider a pairwise analysis of the probability of the occurrence of bigrams in the encoded representation. As an example, again presented on

(7)

-0.15 -0.1 -0.05 0 0.05 0.1 word(t+1) word(t) 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16

Fig. 8. Log ratio of the probabilities of sparse vector pairs from the 2008 dataset versus the 2009 dataset. This is illustrated for K = 16, scale 250 ms.

the dictionary with K = 16, we computed the probability of occurrence each consecutive word (sparse vector) pair for the 2008 and 2009 datasets. The probability of the ordered occur-rence for the pair(w1, w2) (bigram), is written as P (w1, w2).

In a random system, we would not expect to see patterns across years in the probabilities.

We then computed the log ratio of these probabilities from 2008 encoded dataset and the encoded 2009 dataset at the 250 ms time scale, which is shown in Fig. 8.

The results show that there are differences in bigram oc-currence across years. For example we see that pair (6, 7) is more frequent in 2008, while pair (6, 2) is less frequent in 2008. This could be interpreted as song evolution (encoded as order of words) from one year to another, while subunits (encoded as words) remain stable.

The sparse coding technique shows promise as a means to provide objective insight into evolution of song structure, although further development is needed to support strong conclusions about the interpretation of the encoding.

This paper focuses on the year-to-year variability of the songs structure and do not discuss the possible influence of other factors on the songs structure, such as populations, habitat, weather conditions . . . Current research works are conducted in our team on these topics using our original sparse coding framework.

V. CONCLUSION

This paper demonstrates the promise of the sparse coding method to learn features in an unsupervised manner for humpback whale songs and to support their analysis. The sparse dictionary that is presented in this paper is automatically learned from the recordings, and captures variations in sound units and subunits with a limited number of elements. This approach is suitable for analyzing signals with which contain both stable and variable features. The acoustic datasets pro-cessed in this work exhibit those characteristics, where the

variability is found among singers and between years, but stable features are present for example, in the ambient noise.

In summary, we presented:

1) An unsupervised method for encoding a large and vari-able bioacoustic dataset into a dictionary;

2) a method which establishes criteria for sparse coding to limit the number of the elements of this dictionary; 3) using the Lasso algorithm, the sparse coding

distin-guishes the salient features of the signal from the noise components;

4) an interpretation of the approach in characterizing evolv-ing sound units and stable subunits;

5) an analysis of year to year variation of the dataset. The results establish sparse coding as a promising method for analyzing humpback whale songs.

This paper lends new support for the concept of units versus component subunits in humpback whale songs. We show that the shortest units (subunits) are the most stable, occurring with similar frequency across two consecutive years, while the longest units exhibit more variation from one year to one other. 250 ms segments appear to be an appropriate length for encoding stable features of whale songs, possibly corresponding to subunits.

In future work, a systematic information theoretic analysis will be used to characterize the evolution of sound units. The approach will be applied to multiple geographic locations across multiple years to further explore population differences and song evolution.

Another potential application would be to model the vocal identity of individual whales, which could provide a basis for singer authentication or dialect identification. We also intend to explore the method’s applicability to analyzing sounds from other species beyond humpback whales.

ACKNOWLEDGMENT

The authors would like to thank PACA region which sup-ported the Phd program of Y. Doh. We also thank Cetamada NGO (www.cetamada.org) and LAM and Cesigma for helping to record the files. We also thank Fondation Total for the sup-port of our project BAOBAB, along with Institut Universitaire de France, and MASTODONS MI CNRS SABIOD Scaled Acoustic Biodiversity project (http://sabiod.org).

REFERENCES

[1] W. Schevill, “Underwater sounds of cetaceans,” Marine Bio-Acoustics,

ed by W.N. Tavolga (Pergamon, Oxford), pp. 307–316, 1964. [2] Anonymous, “Singing whales,” Nature, vol. 224, p. 217, 1969. [3] H. E. Winn, P. J. Perkins, and T. C. Poulter, “Sounds of the humpback

whale,” in 7th Annual Conference on Biological Sonar and Diving

Mammals, 1970, pp. 39–52.

[4] P. J. Clapham and D. K. Mattila, “Humpback whale songs as indicators of migration routes,” Marine Mammal Science, vol. 6, no. 2, pp. 155– 160, 1990.

[5] C. W. Clark and P. J. Clapham, “Acoustic monitoring on a humpback whale (megaptera novaeangliae) feeding ground shows continual singing into late spring,” Proceedings - Royal Society of London. Biological

sciences, vol. 271, no. 1543, pp. 1051–1057, 2004.

[6] D. A. Glockner, “Determining the sex of humpback whales (megaptera novaeangliae) in their natural environment,” in R. Payne, ed.

(8)

[7] H. Winn and L. Winn, “The song of the humpback whale megaptera novaeangliae in the west indies,” Mar. Biol., vol. 47, pp. 97–114, 1978. [8] L. M. Herman and W. N. Tavolga, “The communication systems of cetaceans,” Cetacean behavior: Mechanisms and function, pp. 149–209, 1980.

[9] P. Tyack, “Interactions between singing hawaiian humpback whales and conspecifics nearby,” Behavioral Ecology and Sociobiology, vol. 8, no. 2, pp. 105–116, 1981.

[10] L. Medrano, M. Salinas, I. Salas, P. L. D. Guevara, A. Agayo, J. Jacob-sen, and C. Baker, “Sex identification of humpback whales, megaptera novaeangliae, on the wintering grounds of the pacific ocean,” Canadian

Journal of Zoology, vol. 72, pp. 1771–1774, 1994.

[11] J. Darling, “Migrations, abundance and behavior of hawaiian humpback whales (megaptera novaeangliae),” Ph.D. dissertation, University of California Santa Cruz, 1983.

[12] D. Cholewiak, “Evaluating the role of song in the humpback whale (megaptera novaeangliae) breeding system with respect to intra-sexual interactions,” Ph.D. dissertation, Cornell University, 2008.

[13] M. Noad, D. Cato, M. Bryden, M. Jenner, and K. Jenner, “Cultural revolution in whale songs,” Nature, London, vol. 408, p. 537, 2000. [14] R. Payne and S. McVay, “Songs of humpback whales,” Science, vol.

173, no. 3997, pp. 585–597, 1971.

[15] P. Suzuki, J. Buck, and P. Tyack, “Information entropy of humpback whale songs,” J. Acoust. Soc. Am., vol. 119, no. 3, pp. 1849–1866, 2006. [16] W. Au, M. Lammers, A. Stimpert, and M. Schotten, “The temporal characteristics of humpback whale songs,” J. Acoust. Soc. Am., vol. 118, no. 3, p. 1940, 2005.

[17] E. Mercado III and A. Kuh, “Classification of humpback whale vocaliza-tions using a self-organizing neural network,” in IEEE World Congress

on Computational Intelligence, vol. 2, 1998, pp. 1584–1589.

[18] P. Rickwood and A. Taylor, “Methods for automatically analyzing humpback song units,” J. Acoust. Soc. Am., vol. 123, no. 3, pp. 1763– 1772, 2008.

[19] D. Helweg, “Geographic and temporal variation in songs of humpback whales,” J. Acoust. Soc. Am., vol. 100, no. 4, p. 2609, 1996.

[20] S. Mazhar, T. Ura, and R. Bahl, “An analysis of humpback whale songs for individual classification,” J. Acoust. Soc. Am., vol. 123, no. 5, p. 3774, 2008.

[21] G. Picot, O. Adam, M. Bergounioux, H. Glotin, and F. Mayer, “Auto-matic prosodic clustering of humpback whales song,” in PASSIVE 08, I. explorer, Ed., 2008, p. 6p.

[22] H. Glotin, L. Gauthier, F. Pace, F. Benard, and O. Adam, “New automatic classification for humpback whale songs,” in PASSIVE 08, P. university and ONR, Eds., 2008, p. 93.

[23] F. Pace, F. Benard, H. Glotin, O. Adam, and P. White, “Subunit definition and analysis for humpback whale classification,” Journal of Applied

Acoustics, vol. 71, november 2010.

[24] F. Pace, P. R. White, and O. Adam, “Classification of humpback whale (megaptera novaeangliae) calls using hidden markov models,” in 5th

International Workshop on Detection, Classification, Localization, and Density Estimation of Marine Mammals using Passive Acoustics, 2011, p. 29.

[25] J. G. Harris and M. D. Skowronski, “Automatic speech processing meth-ods for bioacoustics signal analysis: a case study of cross-disciplinary acoustic research,” in ICASSP, vol. 5, 2006, pp. 793–796.

[26] R. Payne and L. N. Guinee, “Humpback whale (megaptera novaeangliae) songs as an indicator of ”stocks”,” R. Payne, ed. Communication and

behavior of whales, pp. 333–358, 1983.

[27] D. A. Helweg, L. A. Herman, S. Yamamoto, and P. H. Forestell, “Comparison of songs of humpback whales (megaptera novaeangliae) recorded in japan, hawaii, and mexico during the winter of 1989,” Cetacean Research Institute, Tech. Rep. 1, 1990.

[28] S. Cerchio, J. K. Jacobsen, and T. F. Norris, “Temporal and geograph-ical variation in songs of humpback whales, megaptera novaeangliae: Synchronous change in hawaiian and mexican breeding assemblages,”

Animal Behaviour, vol. 62, pp. 313–329, 2001.

[29] F. Pace, P. White, and O. Adam, “Characterisation of sound subunits for humpback whale song analysis,” in 4th International Workshop on

Detection and Localization of Marine Mammals using Passive Acoustics, 2009, p. 56.

[30] S. Davis and P. Nermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,”

IEEE Trans ASSP, vol. 28, pp. 357–366, 1980.

[31] L. Rabiner and B. H. Huang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice Hall, 1993.

[32] D. D. Greenwood, “Auditory masking and the critical band,” Journal of

the Acoustical Society of America, vol. 33, pp. 484–502, 1961. [33] ——, “Critical bandwidth and the frequency coordinates of the basilar

membrane,” Journal of the Acoustical Society of America, vol. 33, no. 1344–1356, 1961.

[34] G. Gravier, “Spro: a free speech signal processing toolkit,” 2010, vers. 5.0. https://gforge.inria.fr/projects/spro (date last viewed 06/07/12). [35] R. Tibshirani, “Regression shrinkage and selection via the lasso,”

Jour-nal of the Royal Statistical Society, Series B, vol. 58, pp. 267–288, 1994. [36] S. S. Chen, D. L. Donoho, Michael, and A. Saunders, “Atomic decompo-sition by basis pursuit,” SIAM Journal on Scientific Computing, vol. 20, pp. 33–61, 1998.

[37] P. Flandrin, R. G. Baraniuk, and O. Michel, “Time-frequency complex-ity and information,” in IEEE International Conference on Acoustics,

Speech, and Processing, vol. 3, 1994, pp. 329–332.

[38] C. Shannon, “A mathematical theory of communication,” Bell System

Technical Journal, vol. 27, pp. 379–423,623–656, 1948.

[39] S. Kullback and R. Leibler, “On information and sufficiency,” Annals of

Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951.

[40] K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, “When is ”nearest neighbor” meaningful?” in Proc. of the 7th International

Conference on Database Theory (ICDT). Springer-Verlag London, 1999, pp. 217–235.

[41] Y. Doh, J. Razik, S. Paris, O. Adam, and H. Glotin, “Décomposition parcimonieuse des chants de cétacés pour leur suivi,” Traitement du