Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

ASR performance prediction on unseen broadcast programs using convolutional neural networks

Partager "ASR performance prediction on unseen broadcast programs using convolutional neural networks"

N/A

N/A

Protected

Année scolaire: 2021

Info

Protected

Academic year: 2021

Partager "ASR performance prediction on unseen broadcast programs using convolutional neural networks"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

HAL Id: hal-02088829

https://hal.archives-ouvertes.fr/hal-02088829

Submitted on 3 Apr 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

ASR performance prediction on unseen broadcast programs using convolutional neural networks

Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux

To cite this version:

Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux. ASR performance prediction on unseen broadcast programs using convolutional neural networks. ICASSP, 2018, Alberta, Canada. �hal-02088829�

(2)

ASR Performance Prediction on Unseen Broadcast Programs using Convolutional Neural Networks

Zied Elloumi, Laurent Besacier, Olivier Galibert, Juliette Kahn, Benjamin Lecouteux

Contact :

[email protected]

 We used an existing tool named TranscRater for baseline regression approach

 We adapted the TranscRater tool from English to French that requires engineered features to predict the WER performance

 Analysis of predicted WERs

 We presented an evaluation framework for evaluating ASR performance prediction on unseen broadcast programs

 CNNs were very efcient encoding both textual (ASR transcript) and signal to predict WER.

 Future work : Analyze the learnt representations of our performance prediction system

 We built our own French ASR system based on the KALDI toolkit

 We used a French data from diferent broadcast collections : ESTER, ETAPE, REPERE, Quaero

 We proposed a new End-to-End performance prediction system based on CNNs.

 Several approaches to encode speech signal

 Input : textual, speech signal or the both textual+ speech signal

 Output : two proposed methods to predict a continous value ( CNN _Softmax and CNN _ReLU )

 Distribution of speech turns according to their WER

(a) real (b) Regression

_Pred

(c) CNN

_Pred

Regression Baseline Regression Baseline

Conclusion Conclusion

Test prediction WER=31.20

59h27 04h42

04h17 100h51

Train prediction WER=22.29

Train Acoustic

Evaluation framework

Evaluation framework Our proposed approach Our proposed approach Abstract

Abstract

2018 IEEE International Conference on Acoustics, Speech and Signal Processing 15–20 April 2018 • Calgary, Alberta, Canada

Automatic Transcription

Feature Extraction SIG LEX LM POS

PP System

 Propose an heterogenous French corpus dedicated to performance prediction task

 Compare two prediction approaches : regression (engineered features) based vs a new strategy based on convolutional neural networks (learnt features).

 The joint use of textual and signal features did not work for the regression baseline while the combination of inputs for CNNs leads to the best WER prediction performance.

 CNN prediction predicts the WER distribution on a collection of speech recordings

Results & Analysis

Results & Analysis

Références

Télécharger maintenant ( PDF - 2 Page - 871.47 KB )

Documents relatifs

Convolutional Neural Networks based Denoising for Indoor Localization

This proposed approach is tested on a simulated environment, using a real propagation model based on measurements, and compared with the classi- cal matrix completion approach, based

Traffic Parameters Prediction Using a Three-Channel Convolutional Neural Network

Recently, deep learning models such as deep belief network (DBN) [6], recurrent neural network (RNN) [7], and long short-term memory (LSTM) [8] have been used for traffic

Deep Convolutional Neural Networks

 pushing forward to weakly supervised learning the problem Medical Image Segmentation of Thoracic Organs at Risk in CT-images..

Detection of Defective Speech Using Convolutional Neural Networks

Modern signal processing techniques previously used in the field of speech recognition have also made significant progress in the field of automatic detection of abnormal voice..

Neural Criticality: Validation of Convolutional Neural Networks

Figure 6: Accuracy stability on normal dataset (for class ”mountain bike”), showing a gradual increase of accuracy with respect to decreasing neurons’ criticality in case of

Bird Sound Classification Using Convolutional Neural Networks

In general, the spectrogram of bird sound audio is regarded as the input and the model would treat the bird-call identification task as an image classification problem.. This

Affective Content Classification using Convolutional Neural Networks

We present a three-layer convolutional neural network for the classification of two binary target variables ’Social’ and ’Agency’ in the HappyDB corpus exploiting lexical density of

Convolutional Long Short-Term Memory Neural Networks for Hierarchical Species Prediction

SSN 1 The CLNN model used the given 33 layers of spatial feature maps to predict the taxonomic ranks of species.. Every layer of each tiff image was first center-cropped to a size

Documents relatifs

جمالية القصيدة الإسلامية المعاصرة - قراءة في الشعر الجزائري المعاصر-

جمالية القصيدة الإسلامية المعاصرة - قراءة في الشعر الجزائري المعاصر-

314

0

0

A RELAXATION MODEL FOR LIQUID-VAPOR PHASE CHANGE WITH METASTABILITY

A RELAXATION MODEL FOR LIQUID-VAPOR PHASE CHANGE WITH METASTABILITY

37

0

0

Unsteady three-dimensional theoretical model and numerical simulation of a 120-kW chemical looping combustion pilot plant

Unsteady three-dimensional theoretical model and numerical simulation of a 120-kW chemical looping combustion pilot plant

20

0

0

Convolutional Neural Networks for embedded vision

Convolutional Neural Networks for embedded vision

164

0

0

High Performance Convolutional Neural Networks for Document Processing

High Performance Convolutional Neural Networks for Document Processing

7

0

0

In-Vitro Archaeacidal Activity of Biocides against Human-Associated Archaea

In-Vitro Archaeacidal Activity of Biocides against Human-Associated Archaea

6

0

0

Forensics of water quality failure in distribution systems - a conceptual framework

Forensics of water quality failure in distribution systems - a conceptual framework

25

0

0

أثر تقلبات أسعار النفط على النمو الإقتصادي في الجزائر دراسة قياسية في الفترة 1995-2016

أثر تقلبات أسعار النفط على النمو الإقتصادي في الجزائر دراسة قياسية في الفترة 1995-2016

9

0

0