Face-to-Face Contrastive Learning for Social Intelligence Question-Answering

Alex Wilf; Qianli M. Ma; Paul Pu Liang; Amir Zadeh; Louis-Philippe Morency

社会的知性質問応答のための対面対照学習

人工社会知能 (複数の人物のやり取りのニュアンスを理解できるアルゴリズム) を作成することは、マルチモーダルビデオからの顔の表情やジェスチャーを処理する上で、刺激的で新たな課題です。最近のマルチモーダルな手法は、多くのタスクで最先端の技術を確立していますが、特に自己管理型のセットアップでは、社会的相互作用における話す順番をまたがる複雑な対面会話のダイナミクスをモデル化することは困難です。このホワイトペーパーでは、F2F-CL (F2F-CL) を提案します。これは、話す順番の境界に沿ったマルチモーダルな対面相互作用をコンテキスト化するために、因数分解ノードを使用して社会的相互作用をモデル化するように設計されたグラフニューラルネットワークです。 F2F-CL モデルを使用して、同じビデオ内の異なる話すターンの因数分解ノード間で対照的な学習を実行することを提案します。挑戦的な Social-IQ データセットを実験的に評価し、最先端の結果を示しました。

Creating artificial social intelligence - algorithms that can understand the nuances of multi-person interactions - is an exciting and emerging challenge in processing facial expressions and gestures from multimodal videos. Recent multimodal methods have set the state of the art on many tasks, but have difficulty modeling the complex face-to-face conversational dynamics across speaking turns in social interaction, particularly in a self-supervised setup. In this paper, we propose Face-to-Face Contrastive Learning (F2F-CL), a graph neural network designed to model social interactions using factorization nodes to contextualize the multimodal face-to-face interaction along the boundaries of the speaking turn. With the F2F-CL model, we propose to perform contrastive learning between the factorization nodes of different speaking turns within the same video. We experimentally evaluated the challenging Social-IQ dataset and show state-of-the-art results.

updated: Mon Aug 15 2022 19:51:45 GMT+0000 (UTC)

published: Fri Jul 29 2022 20:39:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト