MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

Jinming Zhao; Ruichen Li; Qin Jin; Xinchao Wang; Haizhou Li

MEmoBERT：マルチモーダル感情認識のためのプロンプトベースの学習を備えた事前トレーニングモデル

マルチモーダル感情認識研究は、注釈コストが高く、ラベルがあいまいであるため、スケールと多様性の観点からラベル付きコーパスがないために妨げられています。本論文では、マルチモーダル感情認識のための事前トレーニングモデルMEmoBERTを提案します。これは、膨大な量の大規模なラベルなしビデオデータからの自己監視学習を通じてマルチモーダル関節表現を学習します。さらに、従来の「事前トレーニング、微調整」パラダイムとは異なり、下流の感情分類タスクをマスクされたテキスト予測タスクとして再定式化し、下流のタスクを事前トレーニングに近づけるプロンプトベースの方法を提案します。 2つのベンチマークデータセット、IEMOCAPとMSP-IMPROVでの広範な実験は、提案されたMEmoBERTが感情認識パフォーマンスを大幅に向上させることを示しています。

Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity. In this paper, we propose a pre-training model MEmoBERT for multimodal emotion recognition, which learns multimodal joint representations through self-supervised learning from large-scale unlabeled video data that come in sheer volume. Furthermore, unlike the conventional "pre-train, finetune" paradigm, we propose a prompt-based method that reformulates the downstream emotion classification task as a masked text prediction one, bringing the downstream task closer to the pre-training. Extensive experiments on two benchmark datasets, IEMOCAP and MSP-IMPROV, show that our proposed MEmoBERT significantly enhances emotion recognition performance.

updated: Wed Oct 27 2021 09:57:00 GMT+0000 (UTC)

published: Wed Oct 27 2021 09:57:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト