Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition

Yi Zhang; Mingyuan Chen; Jundong Shen; Chongjun Wang

マルチラベル感情認識のための多用途なマルチモーダル学習の調整

マルチモーダルマルチラベル感情認識（MMER）は、異種の視覚、音声、およびテキストのモダリティからさまざまな人間の感情を識別することを目的としています。以前の方法は、主に複数のモダリティを共通の潜在空間に投影し、すべてのラベルについて同一の表現を学習することに焦点を当てています。これは、各モダリティの多様性を無視し、異なる視点から各ラベルのより豊富なセマンティック情報をキャプチャできません。その上、モダリティとラベルの関連する関係は十分に活用されていません。本論文では、マルチモーダル表現を洗練し、各ラベルの識別能力を強化することを目的として、マルチラベル感情認識（TAILOR）のための多用途マルチモーダル学習を提案します。具体的には、さまざまなモダリティ間の共通性を十分に調査し、各モダリティの多様性を強化するために、敵対的なマルチモーダルリファインメントモジュールを設計します。ラベルモーダル依存性をさらに活用するために、BERTのようなクロスモーダルエンコーダーを考案して、プライベートモダリティ表現とコモンモダリティ表現を粒度の低い方法で徐々に融合し、ラベルガイド付きデコーダーを使用して、各ラベルに合わせた表現を適応的に生成します。ラベルセマンティクスのガイダンス。さらに、ベンチマークMMERデータセットCMU-MOSEIで、アライメントされた設定とアライメントされていない設定の両方で実験を行います。これは、最先端のTAILORよりも優れていることを示しています。コードはhttps://github.com/kniter1/TAILORで入手できます。

Multi-modal Multi-label Emotion Recognition (MMER) aims to identify various human emotions from heterogeneous visual, audio and text modalities. Previous methods mainly focus on projecting multiple modalities into a common latent space and learning an identical representation for all labels, which neglects the diversity of each modality and fails to capture richer semantic information for each label from different perspectives. Besides, associated relationships of modalities and labels have not been fully exploited. In this paper, we propose versaTile multi-modAl learning for multI-labeL emOtion Recognition (TAILOR), aiming to refine multi-modal representations and enhance discriminative capacity of each label. Specifically, we design an adversarial multi-modal refinement module to sufficiently explore the commonality among different modalities and strengthen the diversity of each modality. To further exploit label-modal dependence, we devise a BERT-like cross-modal encoder to gradually fuse private and common modality representations in a granularity descent way, as well as a label-guided decoder to adaptively generate a tailored representation for each label with the guidance of label semantics. In addition, we conduct experiments on the benchmark MMER dataset CMU-MOSEI in both aligned and unaligned settings, which demonstrate the superiority of TAILOR over the state-of-the-arts. Code is available at https://github.com/kniter1/TAILOR.

updated: Sat Jan 15 2022 12:02:28 GMT+0000 (UTC)

published: Sat Jan 15 2022 12:02:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト