Language Identification with Deep Bottleneck Features

Zhanyu Ma; Hong Yu

深いボトルネック機能を備えた言語識別

本論文では、インテリジェント車両のSLDアプリケーションに特に適したLong Short Term Memory（LSTM）ニューラルネットワークに基づくエンドツーエンドの短い発話音声言語識別（SLD）アプローチを提案しました。 LSTM学習に使用される機能は、転送学習法によって生成されます。深層ニューラルネットワーク（DNN）のボトルネック機能は、北京語の音響音声分類のためにトレーニングされ、LSTMトレーニングに使用されます。短い発話のSLDの精度を向上させるために、フェーズボコーダーベースのタイムスケール修正（TSM）メソッドを使用して、テスト発話の音声評価を低減および増加させます。通常のスピーチレートをスプライシングすることで、発話を減らしたり増やしたりすることで、テスト発話の長さを延長して、SLDシステムのパフォーマンスを向上させることができます。 AP17-OLRデータベースの実験結果は、提案された方法が、特に1秒と3秒の長さの短い発話でSLDのパフォーマンスを改善できることを示しています。

In this paper we proposed an end-to-end short utterances speech language identification(SLD) approach based on a Long Short Term Memory (LSTM) neural network which is special suitable for SLD application in intelligent vehicles. Features used for LSTM learning are generated by a transfer learning method. Bottle-neck features of a deep neural network (DNN) which are trained for mandarin acoustic-phonetic classification are used for LSTM training. In order to improve the SLD accuracy of short utterances a phase vocoder based time-scale modification(TSM) method is used to reduce and increase speech rated of the test utterance. By splicing the normal, speech rate reduced and increased utterances, we can extend length of test utterances so as to improved improved the performance of the SLD system. The experimental results on AP17-OLR database shows that the proposed methods can improve the performance of SLD, especially on short utterance with 1s and 3s durations.

updated: Sun Feb 02 2020 09:57:14 GMT+0000 (UTC)

published: Tue Sep 18 2018 19:34:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト