Multimodal Emotion Recognition with Modality-Pairwise Unsupervised Contrastive Loss

Riccardo Franceschini; Enrico Fini; Cigdem Beyan; Alessandro Conti; Federica Arrigoni; Elisa Ricci

モダリティを備えたマルチモーダル感情認識-ペアワイズ教師なし対照損失

感情認識は、いくつかの実際のアプリケーションに関係しています。利用可能なモダリティの増加に伴い、感情の自動理解がより正確に実行されています。マルチモーダル感情認識（MER）の成功は、主に教師あり学習パラダイムに依存しています。ただし、データの注釈は費用と時間がかかり、感情の表現と知覚はいくつかの要因（年齢、性別、文化など）に依存するため、信頼性の高いラベルを取得することは困難です。これらに動機付けられて、私たちはMERの教師なし特徴学習に焦点を合わせています。私たちは個別の感情を考慮し、モダリティとしてテキスト、オーディオ、ビジョンが使用されます。ペアワイズモダリティ間の対照的な損失に基づく私たちの方法は、MER文献での最初の試みです。私たちのエンドツーエンドの特徴学習アプローチには、既存のMER手法と比較して、いくつかの違い（および利点）があります。i）教師なしであるため、学習にはデータラベリングコストがありません。 ii）データの空間的拡張、モダリティの調整、多数のバッチサイズまたはエポックを必要としません。 iii）推論時にのみデータ融合を適用します。 iv）感情認識タスクで事前にトレーニングされたバックボーンを必要としません。ベンチマークデータセットでの実験は、私たちの方法がMERで適用されたいくつかのベースラインアプローチと教師なし学習方法よりも優れていることを示しています。特に、それはいくつかの監視されたMERの最先端を超えています。

Emotion recognition is involved in several real-world applications. With an increase in available modalities, automatic understanding of emotions is being performed more accurately. The success in Multimodal Emotion Recognition (MER), primarily relies on the supervised learning paradigm. However, data annotation is expensive, time-consuming, and as emotion expression and perception depends on several factors (e.g., age, gender, culture) obtaining labels with a high reliability is hard. Motivated by these, we focus on unsupervised feature learning for MER. We consider discrete emotions, and as modalities text, audio and vision are used. Our method, as being based on contrastive loss between pairwise modalities, is the first attempt in MER literature. Our end-to-end feature learning approach has several differences (and advantages) compared to existing MER methods: i) it is unsupervised, so the learning is lack of data labelling cost; ii) it does not require data spatial augmentation, modality alignment, large number of batch size or epochs; iii) it applies data fusion only at inference; and iv) it does not require backbones pre-trained on emotion recognition task. The experiments on benchmark datasets show that our method outperforms several baseline approaches and unsupervised learning methods applied in MER. Particularly, it even surpasses a few supervised MER state-of-the-art.

updated: Sat Jul 23 2022 10:11:24 GMT+0000 (UTC)

published: Sat Jul 23 2022 10:11:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト