Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding

Tanay Agrawal; Dhruv Agarwal; Michal Balazia; Neelabh Sinha; Francois Bremond

クロスアテンショントランスフォーマーと行動エンコーディングを使用したマルチモーダルパーソナリティ認識

パーソナリティコンピューティングと感情コンピューティングは、多くの研究分野で最近関心を集めています。タスクのデータセットには、通常、ビデオ、オーディオ、言語、生体信号などの複数のモダリティがあります。この論文では、利用可能なすべてのデータを活用するタスクの柔軟なモデルを提案します。このタスクには複雑な関係が含まれ、特にビデオ処理に大きなモデルを使用しないようにするために、モデルへの変更を最小限に抑えてパフォーマンスを向上させる動作エンコーディングの使用を提案します。変圧器を使用したクロスアテンションは、最近人気があり、さまざまなモダリティの融合に利用されています。長期的な関係が存在する可能性があるため、入力をチャンクに分割することは望ましくありません。したがって、提案されたモデルは入力全体を一緒に処理します。私たちの実験は、上記の貢献のそれぞれの重要性を示しています

Personality computing and affective computing have gained recent interest in many research areas. The datasets for the task generally have multiple modalities like video, audio, language and bio-signals. In this paper, we propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding which boosts performance with minimal change to the model. Cross-attention using transformers has become popular in recent times and is utilised for fusion of different modalities. Since long term relations may exist, breaking the input into chunks is not desirable, thus the proposed model processes the entire input together. Our experiments show the importance of each of the above contributions

updated: Wed Dec 07 2022 22:18:25 GMT+0000 (UTC)

published: Wed Dec 22 2021 19:14:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト