Semantic Context Model for Efficient Speech Recognition

(1)

ISAE-SUPAERO Conference paper

The 1st International Conference on Cognitive Aircraft

Systems – ICCAS

March 18-19, 2020

https://events.isae-supaero.fr/event/2

Scientific Committee

•

Mickaël Causse, ISAE-SUPAERO

•

Caroline Chanel, ISAE-SUPAERO

•

Jean-Charles Chaudemar, ISAE-SUPAERO

•

Stéphane Durand, Dassault Aviation

•

Bruno Patin, Dassault Aviation

•

Nicolas Devaux, Dassault Aviation

•

Jean-Louis Gueneau, Dassault Aviation

•

Claudine Mélan, Université Toulouse Jean-Jaurès

•

Jean-Paul Imbert, ENAC

Permanent link :

https://doi.org/10.34849/cfsb-t270

Rights / License:

(2)

ICCAS 2020 Semantic Context Model for Effici …

Semantic Context Model for Efficient Speech

Recognition

Content

Introduction

Automatic speech recognition system (ASR) contains three main parts: an acoustic model, a lexicon and a language model. ASR in noisy environments is still a challenging goal because the acoustic information is not reliable and decreases the recognition accuracy. Better language model gives limited performance improvement, modeling mainly local syntactic information. In this paper, we propose a new semantic model to take into account the long-term semantic context information and thus to remove the acoustic ambiguities of noisy ASR.

Recent developments in natural language processing have led to renewed interest in the field of distributional semantics. Word embeddings (WE) (T.Mikolov [Mikolov2013] or BERT model [De-vlin2018]) take into account the semantic contexts of words and have been shown to be effective for several natural language processing tasks. The efficiency and the semantic properties of these representations motivate us to explore these WE for our task. Thus, our ASR is supplemented by a semantic context analysis module in order to detect the poorly recognized words and to propose new words of similar pronunciation corresponding better to the context. This semantic analysis re-evaluates (rescoring) the N-best transcription hypotheses and can be seen as a form of dynamic adaptation in the specific context of noisy data.

Proposed methodology

An effective way to take into account semantic information is to re-evaluate (rescoring) the best hypotheses of the ASR (N-best). The recognition system provides us for each word of the hypoth-esis sentence an acoustic score p_acc (w) and a linguistic score p_ml (w). The best sentence is the one that maximizes the probability of the word sequence: