Using Resources from a Closely-related Language to Develop ASR for a Very Under-resourced Language: A Case Study for Iban
Texte intégral
Documents relatifs
However, much remains to be done on under-resourced ones: there are few dictionaries, parsers, etc. Nevertheless, when published dictionaries are available, it is
The methodology consists in introducing semantic information by using a class-based statistical language model for which classes directly correspond to IF entries.. With
This is why recent studies on cross-lingual acoustic modelling based on subspace Gaussian mixture model (SGMM) seem very promising for speech recognition in limited training
These results are in parallel with system combination outputs as shown in Table 8; a combina- tion of Hybrid G2P with Malay G2P system gave better result compared to Hybrid G2P
Keywords: language documentation, field linguistics, spoken term discovery, word segmentation, zero resource technologies, unwritten
Features were added to the app in order to facilitate the collection of parallel speech data in line with the require- ments of the French-German ANR/DFG BULB (Breaking the
Indeed, we obtain a significant decrease of the word error rate with experiments done on French broadcast news from the ESTER corpus; we also notice an improvement of the sentence
§ Using Malay (closely-related) data in the lexicon design for Iban is better than using English (not a close language). § Cross-lingual effect on acoustic model is more evident on