Inchul Song

A Highly Adaptive Acoustic Model for Accurate Multi-dialect Speech Recognition

Sanghyun Yoo

Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-s… (voir plus)pecific acoustic models are known to perform well in general, they are not easy to maintain when dialect-specific data is scarce and the number of dialects for each language is large. Therefore, a single unified acoustic model (AM) that generalizes well for many dialects has been in demand. In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. We also propose a simple but effective training method to deal with unseen dialects. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.

2019-05-11

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

doi.org

arxiv.org

Dynamic Frame Skipping for Fast Speech Recognition in Recurrent Neural Network Based Acoustic Models

A recurrent neural network is a powerful tool for modeling sequential data such as text and speech. While recurrent neural networks have ach… (voir plus)ieved record-breaking results in speech recognition, one remaining challenge is their slow processing speed. The main cause comes from the nature of recurrent neural networks that read only one frame at each time step. Therefore, reducing the number of reads is an effective approach to reducing processing time. In this paper, we propose a novel recurrent neural network architecture called Skip-RNN, which dynamically skips speech frames that are less important. The Skip-RNN consists of an acoustic model network and skip-policy network that are jointly trained to classify speech frames and determine how many frames to skip. We evaluate our proposed approach on the Wall Street Journal corpus and show that it can accelerate acoustic model computation by up to 2.4 times without any noticeable degradation in transcription accuracy.

2018-04-14

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (publié)

doi.org

Dynamic Layer Normalization for Adaptive Neural Acoustic Modeling in Speech Recognition

Taesup Kim

Inchul Song

Yoshua Bengio

Layer normalization is a recently introduced technique for normalizing the activities of neurons in deep neural networks to improve the trai… (voir plus)ning speed and stability. In this paper, we introduce a new layer normalization technique called Dynamic Layer Normalization (DLN) for adaptive neural acoustic modeling in speech recognition. By dynamically generating the scaling and shifting parameters in layer normalization, DLN adapts neural acoustic models to the acoustic variability arising from various factors such as speakers, channel noises, and environments. Unlike other adaptive acoustic models, our proposed approach does not require additional adaptation data or speaker information such as i-vectors. Moreover, the model size is fixed as it dynamically generates adaptation parameters. We apply our proposed DLN to deep bidirectional LSTM acoustic models and evaluate them on two benchmark datasets for large vocabulary ASR experiments: WSJ and TED-LIUM release 2. The experimental results show that our DLN improves neural acoustic models in terms of transcription accuracy by dynamically adapting to various speakers and environments.

2017-08-19

Interspeech 2017 (publié)

doi.org

arxiv.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Inchul Song

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Inchul Song

Publications