Transfer Learning for Improving Speech Emotion Classification Accuracy

Siddique Latif; Rajib Rana; Shahzad Younis; Junaid Qadir; Julien Epps

音声感情分類精度を改善するための転移学習

既存の音声感情認識研究の大部分は、同じ条件下で収集された同じコーパスからのトレーニングおよびテストデータを使用した自動感情検出に焦点を当てています。このようなシステムのパフォーマンスは、コーパスや言語を超えたシナリオで大幅に低下することが示されています。この問題に対処するために、このペーパーでは、異言語およびコーパスシナリオで新規である音声感情認識システムのパフォーマンスを向上させる転移学習手法を活用します。 3つの異なる言語での5つの異なるコーパスの評価は、ディープビリーフネットワーク（DBN）が、スパースオートエンコーダーおよびSVMベースラインシステムと比較して、コーパス間の感情認識に関する以前のアプローチよりも優れた精度を提供することを示しています。結果はまた、トレーニングに多数の言語を使用し、トレーニングでターゲットデータのごく一部を使用することで、トレーニング例が限られているコーパスでも、ベースラインと比較して精度を大幅に向上できることを示唆しています。

The majority of existing speech emotion recognition research focuses on automatic emotion detection using training and testing data from same corpus collected under the same conditions. The performance of such systems has been shown to drop significantly in cross-corpus and cross-language scenarios. To address the problem, this paper exploits a transfer learning technique to improve the performance of speech emotion recognition systems that is novel in cross-language and cross-corpus scenarios. Evaluations on five different corpora in three different languages show that Deep Belief Networks (DBNs) offer better accuracy than previous approaches on cross-corpus emotion recognition, relative to a Sparse Autoencoder and SVM baseline system. Results also suggest that using a large number of languages for training and using a small fraction of the target data in training can significantly boost accuracy compared with baseline also for the corpus with limited training examples.

updated: Tue Jul 28 2020 01:36:53 GMT+0000 (UTC)

published: Fri Jan 19 2018 10:16:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト