Organization of the thesis - Nouvelles méthodes multiéchelles pour l'analyse non-linéaire de la

The first part of this thesis is dedicated to the introduction of the theoretical foun-dations of the MMF, the presentation of the corresponding computational aspects of the formalism, and also preliminary feasibility studies with regard to its applica-tion to the speech signal. In the second part, we present several case studies on the successful application of the MMF to practical speech analysis problems.

The first part starts with a short discussion about the scientific context of this the-sis in chapter2, where we show where our approach stands in the broad context of non-linear speech processing. Chapter3presents the basics of the MMF, its compo-nents and its advantages over its older counterparts. In Chapter 4 the preliminary experiments are presented to support the idea of applicability of this formalism to speech signal analysis. The experiment is performed over a large database of iso-lated phonemes so as to separately study whether the formalism is applicable to each phoneme family or not.

The second part (the applications), starts with chapter5, where the first observa-tions about the usefulness of this formalism to speech analysis are presented.

Con-tel-00821896, version 1 - 13 May 2013

1.3 o r g a n i z at i o n o f t h e t h e s i s 5

sequently, a simple algorithm is introduced fortext-independent phonetic segmentation of the speech signal. The method is shown to have competitive performance com-pared to the state-of-the-art methods, with interestingly high resolution in detection of phoneme boundaries.

In chapter 6, we first introduce some numerical modifications to the formalism that makes it more adapted to the specific case of the speech signal and then, we proceed to show how the formalism can be used to recognize a subset of points inside the signal (the MSM), from which the whole signal can be reconstructed with good perceptual quality. This subset, which is originally introduced in the formalism as the subset of most informative points of the signal, is then used to develop a novel efficientwaveform codingtechnique.

We argue in chapter 7 about the correspondence of this subset to the physical production mechanism of the speech signal and we discuss how the points in this subset are related to the instants of the most significant excitations of the vocal tract system (the GCIs). As the identification of these points is an important topic in many speech processing applications, we develop a novel solution forGCI detection problem. The method is shown to have comparable performance to the recent state-of-the-art in terms of reliability while it is more robust against noise. Moreover, the algorithms is more efficient than the fastest available methods.

Consequently, these points (GCIs) are used in chapter 8, to introduce a novel solution to the problem ofsparse Linear Prediction(LP) analysis, aiming at estimation of LP coefficients such that the resulting residuals are sparse. The approach is based on a weighting of the l₂-norm objective function using the GCI estimates. This provides an efficient closed-form solution for this interesting problem.

The same philosophy of relating the MSM to significant excitations of vocal tract system is followed in chapter9formulti-pulse approximation of speech excitation source.

The latter provides an efficient engine to be used inside classical multi-pulse excita-tion coder. The resulting coder provides almost the same level of perceptual quality as the classical one, while being much more efficient. Finally we make our conclu-sions in chapter10.

tel-00821896, version 1 - 13 May 2013

1.4 Publications

• Journal

– V. Khanagha, K. Daoudi, “An Efficient Solution to Sparse Linear Prediction Analysis of Speech", EURASIP Journal on Audio, Speech, and Music cessing - Special Issue on Sparse Modeling for Speech and Audio Pro-cessing (June2012).

– V. Khanagha, K. Daoudi, O. Pont and H. Yahia,“Non-linear speech represen-tation based on local predictability exponents", Elsevier’s Neurocomputing Journal, special issue on Non-Linear Speech Signal processing (March 2012).

– V. Khanagha, K. Daoudi, H. Yahia, O. Pont, “A novel approach to phonetic segmentation through local singularity analysis of speech", submitted to Else-vier’s DSP journals.

– V. Khanagha, K. Daoudi, H. Yahia “Robust detection of glottal closure in-stants through local singularity analysis of speech", in preparation for IEEE Transactions on Audio, Speech, and Language Processing.

• Peer-reviewed conferences/proceedings

– V. Khanagha, K. Daoudi, “Efficient multi-pulse approximation of speech exci-tation using the most singular manifold", Accepted in INTERSPEECH2012, Portland Oregon, United States.

– V. Khanagha, H. Yahia, K. Daoudi, O. Pont,“Reconstruction of Speech Sig-nals from their Unpredictable Points Manifold", NOn LInear Speech Process-ing (NOLISP), November 2011, Las Palmas de Gran Canaria, Spain, pro-ceedings published in: Lecture Notes in Computer Science: Advances in Nonlinear Speech Processing, Springer.

– V. Khanagha, K. Daoudi, O. Pont, H. Yahia, “Improving text-independent phonetic segmentation based on the micro-canonical multi-scale formalism", In-ternational Conference on Acoustics, Speech and Signal Processing (ICASSP), May2011, Prague, Czech Republic.

– V. Khanagha, K. Daoudi, O. Pont, H. Yahia,“A novel text-independent pho-netic segmentation algorithm based on the micro-canonical multi-scale formal-ism", (INTERSPEECH), September2010, Makuhari, Japan.

– V. Khanagha, K. Daoudi, O. Pont, H. Yahia, “Application of the micro-canonical multi-scale formalism to segmentation of speech signals", European Signal Processing Conference (EUSIPCO), August 2010, Aalborg, Den-mark.

tel-00821896, version 1 - 13 May 2013

1.4 p u b l i c at i o n s 7

• In French

– V. Khanagha, K. Daoudi, O. Pont and H. Yahia, “Une nouvelle approche non-linéaire pour la segmentation phonétique", XXIIIe Colloque GRETSI2011, September2011, Bordeaux, France.

– V. Khanagha, K. Daoudi, O. Pont and H. Yahia, “Segmentation indépen-dante du texte par le formalisme multiéchelles microcanonique", MAnifestation des JEunes Chercheurs en Sciences et Technologies de l’Information et de la Communication (MAJESTIC), October2010, Bordeaux, France.

– V. Khanagha, K. Daoudi, O. Pont and H. Yahia, “Application du formal-isme multiéchelles microcanonique pour la segmentation des signaux de parole", CORESA2010, December2010, Lyon, France.

tel-00821896, version 1 - 13 May 2013

Part I

T H E M I C R O C A N O N I C A L M U LT I S C A L E F O R M A L I S M

tel-00821896, version 1 - 13 May 2013

2

Dans le document Nouvelles méthodes multiéchelles pour l'analyse non-linéaire de la parole ~ Association Francophone de la Communication Parlée (Page 22-29)