• Aucun résultat trouvé

Speech enhancement and psychoacoustics

N/A
N/A
Protected

Academic year: 2021

Partager "Speech enhancement and psychoacoustics"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-02136736

https://hal.archives-ouvertes.fr/hal-02136736

Submitted on 22 May 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Speech enhancement and psychoacoustics

Asmaa Amehraye, Dominique Pastor

To cite this version:

Asmaa Amehraye, Dominique Pastor. Speech enhancement and psychoacoustics. European Mathe-matical Psychology group : 37ème meeting, Sep 2006, Brest, France. �hal-02136736�

(2)

Speech enhancement and psychoacoustics

Asmaa Amehraye

, Dominique Pastor

Abstract

The purpose of this presentation is to describe why and how psychoacoustic models are used to design speech enhancement systems capable of improving speech intelligibility in presence of noise (for mobile communication applica-tions, for instance) or capable of denoising sufficiently well noisy speech sig-nals so as to improve the recognition rate of some automatic speech recognizer (for instance, robot monitoring in noisy environment, hand-free applications on board of vehicles, military fastjets and helicopters).

To begin with, standard methods aimed at denoising speech signals are performed in the spectral domain without taking into account the perceptual characteristics of the speech signal to enhance. They succeed in improving the Signal to Noise Ratio (SNR) but return annoying and unpleasant residual noise known as musical noise.

In the last few decades, psychoacoustic models have then attracted a great deal of interest. The objective is to improve the perceptual quality of the enhanced speech signal. The psychoacoustic model is used to control the enhancement process in order to find the best trade-off between noise reduc-tion, residual noise and speech distortion. The masking phenomenon is the main human auditory system property which is used to design perceptually motivated speech enhancement systems.

The masking phenomenon occurs when the maskee is made inaudible by a stronger signal that occurs simultaneously. We model this phenomenon by a masking threshold calculation that depends on the signal to denoise ([3]). Any signal below this threshold is inaudible. The main idea is then

A. Amehraye is with the GSCM-Facult´e des Sciences de Rabat, Morocco

and also with Ecole Nationale Sup´erieure des T´el´ecommunications de Bretagne

asmaa.amehraye@enst-bretagne.fr

D. Pastor is with Ecole Nationale Sup´erieure des el´ecommunications

de Bretagne, Technopˆole de Brest-Iroise, CS 83818, 28238 Brest, France

dominique.pastor@enst-bretagne.fr

(3)

to mask partially or totally the distortion. We will present and comment several filters based on this principle ([2], [1]).

Psychoacoustic models are also very popular to assess the quality of speech after coding or enhancement. By using objective tests based on these models, we avoid subjective tests that are very costly in time and resources. The most used objective criteria based on psychoacoustic models are the MBSD (Modified Bark Spectral Distortion) and the PESQ (Perceptual Eval-uation of Speech Quality) [4], [5], [6]. We will present how these objective tests are used and commented in relation with standard and perceptually motivated denoising techniques.

References

[1] A. Amehraye, D. Pastor, S. Ben Jebara, On the Application of Re-cent Results in Statistical Decision and Estimation Theory to Percep-tual Filtering of Noisy Speech Signals, Proceedings of the second IEEE-EURASIP International Symposium on Control, Communications, and Signal Processing, ISCCSP’06, Marrakech, Morocco, 2006.

[2] Y. Hu and P. Loizou, “Incorporating a psychoacoustic model in fre-quency domain speech enhancement”, IEEE Signal Processing Letters, 11(2), 2004, pp. 270-273.

[3] Johnston, J. D, “Transform coding of audio signals using perceptual noise criteria”, IEEE Jour. Selected Areas Commun., vol. 6, no. 2, 1988, pp. 314-323.

[4] S. Quackenbush, T. Barnwell, and M. Clements, “Objective Measures of Speech Quality”. Englewood Cliffs, NJ: Prentice-Hall, 1988.

[5] S.Wang, A. Sekey and A.Gersho, “An objective measure for predicting subjective quality of speech coders”, IEEE Journal on Selected Areas in Communications , vol. SAC-10, 1992, pp. 819-829.

[6] W. Yang, M. Dixon and R Yantorno, “Modified bark spectral distor-tion measure which uses noise masking threshold”, IEEE Speech coding Workshop, Pocono Manor, 1997, pp. 55-56.

Références

Documents relatifs

The video only presentations show groupings of vocalic context and place of articulation, but the synthetic video results in higher similarity between velars and alveolars and a

The resulting masking threshold (dashed black curve) is below a first noise component at frequency f 1 and above another noise component, the maskee, located at frequency f 2. We do

Narrow-band spectra were used to closely study the frequency content of traffic-induced ground vibrations, whereas 113-octave spectra and broad-band amplitudes were used

Synthetic speech generated from each emotional voice was evaluated using from three perspectives: measur- ing the speech quality; measuring the ability of listeners to

The best recognition performance is obtained with the combination of the speech enhancement method of the two-stage, mel- warped, Wiener filter noise reduction

Table II presents the results of the number of IVF cycles, the number of oocytes retrieved, the number of pronucleids obtained, the number of embryos transferred, the pregnancy

Taking into account that acoustic speech envelope follows artic- ulatory facial movements with a delay of up to 200 ms (e.g., [ 48 ]) and that neural effects of electric

The principle of Turbo iteration will be introduced to speech enhancement technology in this paper: use two different speech enhancement systems which based on different models