• Aucun résultat trouvé

A non linear analysis for clean and noisy speech

N/A
N/A
Protected

Academic year: 2021

Partager "A non linear analysis for clean and noisy speech"

Copied!
2
0
0

Texte intégral

(1)

A NON L I N E A R A N A L Y S I S F O R C L E A N A ND N O I S Y S P E E C H

J e a n R o u at, Yong C h u n Liu an d S ylvain Le nui eux

Dépt des sciences appliquées, Université du Québec à CMcoutiim,

CHICOUTIMI, Québec, Canada, G7H 2B1

1 . I n t r o d u c t i o n

The research in speech analysis is recognized to be an important aspect in the a rea o f speech p ro cessin g , w ith applications in speech coding, speech re c o g n itio n , etc. D e p en d in g on the a p p lic a tio n , th e speech a n a ly z e r h a s to e x tra c t th e m ost appropriate param eters. This p aper will focus on the problem of sp e e c h a n a ly s is w ith p o s s i b l e a p p l ic a t io n s in sp e e c h recognition. It is know n that sp eaker-independent recognition of continuous speech is a very com plicated task w hich has not yet been fully m astered. T he better the quality o f the analysis, the easier it becom es to recognize w hat has been spoken. T he a u to m a tic " d e m o d u la tio n " o f s p e e c h w ith n o n lin e a r operators, based on perceptive k n o w led g e is a problem which has not yet b een fully ad d ressed, and speech "dem odulation" m ight assist the researcher in the understanding of speech and / or in the design o f a simple and efficient speech analysis.

2 . M o d u l a t i o n

R esearch w ork on autom atic dem o d u latio n o f speech can be ju stified becau se o f the h y pothesis p ro p o sin g that the hum ain brain has neural cells specialized in A m plitude M odulation (AM) and Frequency M odulation (FM ) detection [1] [7] [8].

It is k now n fro m previous stu d ies (see [5], for exam ple) that fluctuations in the envelope rath er than in the fine structure of the sp eech sig n a l carry th e in f o rm a tio n re g a rd in g speech intelligibility. T he flu ctu atio n s o f the en v elo p e o f a speech signal are d ep en d en t on the rela tiv e freq u en cy , the relative en erg y and th e rela tiv e p h a se o f th e fo rm an ts (fo r voiced speech). This suggests again an analysis b ased on a m odel of speech "m odulation".

3 . N o n l i n e a r o p e r a t o r s

Recently Kaiser [2] proposed a nonlinear operator (called Teager energy operator) able to extract the en ergy of a signal based on m echanical and physical considerations. It has been shown [3] that this operator is able to tract the envelope o f an am plitude or frequency m odulated signal.

A n o th e r n o n lin e a r o p e ra to r has b e e n p ro p o s e d [6] (called "D y n ") b a se d on p e rc e p tiv e c o n sid e ra tio n s. T his o p erato r show s surprising ability to enhan ce the FM -A M m odulation in speech, and can be com pared to the T eager energy operator. One could consider that the T e a g e r operator is the envelope of the D yn o p e ra to r f o r v o w els. F ig u re 1 illu s tra te s th is idea. D e p en d in g on the ap p licatio n , one can use the D yn or the T eager operators. T he top section o f Fig. 1. show s fo u r pitch p eriods taken from the original speech (ftf) o f a m ale speaker (pitch frequency = 130Hz). T he second section presents the band-pass-filtered speech [4] w ith a center frequency of 2300Hz. T he third section show s the output o f the D yn operator on the band-pass-filtered speech. A nd the fourth section illustrates the output o f the T e a g e r energy o p e ra to r on the sam e band-pass- filtered speech. T he D yn a n d the T e a g e r o perators show the m odulated energy pulses characteristics o f the speech signal.

Is there a link betw een these pulses and the form ants? Are these pulses characteristic o f a no n lin ear coupling betw een the glottal source and the vocal tract? Is it possible to use these nonlinear o p e ra to rs to e x tra c t an in f o rm a tio n s u ita b le fo r speech recognition? Most of these questions do not have a clear answ er yet. A nd this p aper proposes an analysis w hich can be used to answ er the above questions.

4 . T h e a n a l y s i s

T he actual version o f the an a ly z e r is com prised of a ban k of twenty filters from 300H z to 3300H z. T hese filters sim ulate the frequency analysis perform ed by the cochlea. There are rounded exponential filters w ith the E quivalent R ectangular Bandw idths (ERB) proposed by M oore and G lasberg [4]. T he output of each filter is then p ro c e sse d by eith er the Dyn op erato r or by the T eager E nergy o p erato r dep en d in g on the application, see [6] for more details.

This m odel is very sim ple and no attem pt has been m ade to represent the exact transduction m echanism from the inner hair cells (for afferent inform ation) o r from the outer hair cells (for active feedback). T he Dyn o p erato r needs one m ultiplication and one ad d itio n p e r sam p le and the T e a g e r o p erator, tw o m ultiplications and one addition p e r sam ple.

5 . C o m m e n t s

Figure 2 show s the output o f the m odel for three pitch periods from the tra n sitio n /d //i/ in th e sy llab le ”di" from a male speaker.

It can be seen that the form ant inform ation is preserved and that high-frequency c o m p o n en ts are enh an ced in co m parison with low -frequency com ponents. T he pulses of energy are better seen in high-frequency. From Figure 2, it can be seen that the low- frequency com ponents are not in phase with the high-frequency c o m p o n e n ts .

6 . C o n c l u s i o n

A n e w a n a ly s is h a s b e e n p r o p o s e d b a s e d on n o n lin e a r transform ations o f the signal and on perceptive k n o w ledge. It has b e e n show n th a t th is a n a ly sis p re s e rv e s the fo rm a n t structure [6] on the frequency dim ension and has a very good tim e reso lu tio n . T h is an a ly sis g iv e s a th ree dim en sio n n al representation o f speech: "m echanical energy". B ark scale and tim e scale. In co m p ariso n w ith a spectrogram representation, this analysis show s a m ore im portant redundancy in frequency (B ark scale) and in tim e (m odulation) and consequently should be m ore resistant to noise.

7 . R e f e r e n c e s

[1] G ard n er, R .B . and W ilso n J.P. (1979). E vidence for d ire c tio n -sp e c ific c h a n n e ls in th e p ro c e s s in g o f freq u en cy m odulation. J. A coust. soc. A m er. 66, p. 704.

[2] Kaiser, I.F. (1990). On a sim ple algorithm to calculate the 'e n e rg y ' o f a sig n a l. P r o c e e d in g s o f IE E E -IC A S S P '9 0 , A lbuquerque, pp. 381-384.

(2)

-[3] M aragos, P. Quatieri, T. and K aiser, J.F. (1991). Speech nonlinearities, m odulations and energy operators. Proceedings of the IEEE - IC A SS F 9 1 , Toronto, pp. 421-424.

[4] M oore. B .C.J. and G lasberg, B.R. (1983). S u ggested fo r m u la e f o r c a l c u la t in g a u d i to r y - f il te r b a n d w id th s and excitation patterns. J. Acoust. Soc. Am. 74, pp. 750-753. [5] Plom p. R. (1988). E ffect of am plitude com p ressio n in hearing aids in the light o f the m odulation-transfer function. J. Acous. Soc. A m er. 83 (6), pp. 2322-2327.

[6] R ouat. J. (1991). D yn: a n onlinear operator fo r speech analysis. U niversité du Q uébec à Chicoutim i, dépt des sciences appliquées, rapport interne.

[7] Tansley, B.W. and SUFF1ELD. J.B. (1983). Time course of adaptation and recover}' o f c h a n n e ls selectiv ely sensitive to frequency and am plitude modulation. J. Acoust. Soc. Amer. '74, p. 765.

[81 W akefield, G .H. and V iem eister. N.F. (1984). Selective adaptation to linear frequency-m odulated sw eeps: Evidence for direction-specific FM channels ? J. Acoust. Soc. A mer. 75, p.

1588.

/ V v / V \ A A

(s

yw

Fig . I : non linear operators o n b an d -p ass-filtered s p e e c h , C . F . 2 3 0 0 H z

HI f ro m a male

three pitch periods ot the •me ch anic al energy

scale

Figure

Fig .  I  :  non linear  operators  o n   b an d -p ass-filtered   s p e e c h ,  C . F

Références

Documents relatifs

Then the collection {B ∞ (λ, 1/2)} λ∈Λ of unit hyper- cubes centred on points of Λ tiles the plane if and only if in a canonical basis of R n (that is, permuting co- ordinates

As I came to understand what happens in such circles, the relationship between the bodily ontology and the social ontology became more apparent, so that the juncture was not only

For the first time, the method of fungi ecology, PCR-DGGE was used as a new traceability analytical tool to identify the geographical origin of tropical timber from four

Given an unknown speech signal, the presence of a desired speech unit is esti- mated by searching for each time frame the corresponding segment, that provides the maximum

Figure 1. Log-log graph of the distribution of the performance of acoustic classifiers for both feature sets. This distribution of the performance of classifiers is

First we make this set into a set of

In this sense we rely in our argumentation on reductio ad impossibile: if we assume a continuum approach and compare different speech reporting constructions of our two languages to

ASSOCIATION OF CANADA LANDS SURVEYORS - BOARD OF EXAMINERS WESTERN CANADIAN BOARD OF EXAMINERS FOR LAND SURVEYORS ATLANTIC PROVINCES BOARD OF EXAMINERS FOR LAND SURVEYORS ---..