A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

Li Liu; Lufei Gao; Wentao Lei; Fengji Ma; Xiaotian Lin; Jinting Wang

ボディランゲージの認識と生成のための深層マルチモーダル学習に関する調査

ボディランゲージ（BL）とは、身体の動き、ジェスチャー、顔の表情、姿勢を通じて表現される非言語コミュニケーションを指します。これは、話し言葉や書き言葉を使用せずに、情報、感情、態度、意図を伝えるコミュニケーションの形式です。これは対人関係において重要な役割を果たし、口頭によるコミュニケーションを補完したり、上書きしたりすることもあります。深層マルチモーダル学習技術は、BL のこうした多様な側面の理解と分析において有望であることが示されています。この調査では、BL の生成と認識への応用に重点を置いています。いくつかの一般的な BL、つまり、手話 (SL)、キュードスピーチ (CS)、共同スピーチ (CoS)、およびトーキングヘッド (TH) が考慮されており、最初に分析を実施し、これら 4 つの BL 間の関係を確立しました。時間。それらの生成と認識には、多くの場合、複数の方法によるアプローチが含まれます。 BL 研究用のベンチマークデータセットは、これらのデータセットに対する SOTA メソッドの評価とともに、適切に収集および整理されています。この調査では、限定されたラベル付きデータ、マルチモーダル学習、未知の話者や言語にモデルを一般化するためのドメイン適応の必要性などの課題が浮き彫りになっています。自己教師あり学習手法の探索、他のモダリティからのコンテキスト情報の統合、大規模な事前トレーニング済みマルチモーダルモデルの活用など、今後の研究の方向性が示されています。要約すると、この調査論文は、さまざまな BL 世代と認識に対するディープマルチモーダルラーニングの包括的な理解を初めて提供します。進歩、課題、将来の方向性を分析することで、この分野を前進させる研究者や実務家にとって貴重なリソースとして役立ちます。さらに、BL の認識と生成のための深層マルチモーダル学習に関する論文リストを継続的に更新しています (https://github.com/wentaoL86/awesome-body- language)。

Body language (BL) refers to the non-verbal communication expressed through physical movements, gestures, facial expressions, and postures. It is a form of communication that conveys information, emotions, attitudes, and intentions without the use of spoken or written words. It plays a crucial role in interpersonal interactions and can complement or even override verbal communication. Deep multi-modal learning techniques have shown promise in understanding and analyzing these diverse aspects of BL. The survey emphasizes their applications to BL generation and recognition. Several common BLs are considered i.e., Sign Language (SL), Cued Speech (CS), Co-speech (CoS), and Talking Head (TH), and we have conducted an analysis and established the connections among these four BL for the first time. Their generation and recognition often involve multi-modal approaches. Benchmark datasets for BL research are well collected and organized, along with the evaluation of SOTA methods on these datasets. The survey highlights challenges such as limited labeled data, multi-modal learning, and the need for domain adaptation to generalize models to unseen speakers or languages. Future research directions are presented, including exploring self-supervised learning techniques, integrating contextual information from other modalities, and exploiting large-scale pre-trained multi-modal models. In summary, this survey paper provides a comprehensive understanding of deep multi-modal learning for various BL generations and recognitions for the first time. By analyzing advancements, challenges, and future directions, it serves as a valuable resource for researchers and practitioners in advancing this field. n addition, we maintain a continuously updated paper list for deep multi-modal learning for BL recognition and generation: https://github.com/wentaoL86/awesome-body-language.

updated: Thu Aug 17 2023 08:15:51 GMT+0000 (UTC)

published: Thu Aug 17 2023 08:15:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト