Multimodal Across Domains Gaze Target Detection

Francesco Tonini; Cigdem Beyan; Elisa Ricci

ドメイン全体のマルチモーダル視線ターゲット検出

この論文では、三人称視点からキャプチャされた単一の画像における視線ターゲット検出の問題に対処します。シーン内の人がどこを見ているかを推測するマルチモーダルディープアーキテクチャを紹介します。この空間モデルは、対象人物の頭部画像、豊富なコンテキスト情報を表すシーンおよび深度マップでトレーニングされます。私たちのモデルは、いくつかの先行技術とは異なり、注視角度の監視を必要とせず、頭の向きの情報および/または対象者の目の位置に依存しません。広範な実験により、複数のベンチマークデータセットでの方法の優れたパフォーマンスが実証されました。また、マルチモーダルデータの共同学習を変更することにより、メソッドのいくつかのバリエーションを調査しました。いくつかのバリエーションは、いくつかの従来技術よりも優れています。このホワイトペーパーで初めて、視線ターゲット検出のドメイン適応を調べ、マルチモーダルネットワークを強化して、データセット間のドメインギャップを効果的に処理します。提案された方法のコードは、https://github.com/francescotonini/multimodal-across-domains-gaze-target-detection で入手できます。

This paper addresses the gaze target detection problem in single images captured from the third-person perspective. We present a multimodal deep architecture to infer where a person in a scene is looking. This spatial model is trained on the head images of the person-of- interest, scene and depth maps representing rich context information. Our model, unlike several prior art, do not require supervision of the gaze angles, do not rely on head orientation information and/or location of the eyes of person-of-interest. Extensive experiments demonstrate the stronger performance of our method on multiple benchmark datasets. We also investigated several variations of our method by altering joint-learning of multimodal data. Some variations outperform a few prior art as well. First time in this paper, we inspect domain adaption for gaze target detection, and we empower our multimodal network to effectively handle the domain gap across datasets. The code of the proposed method is available at https://github.com/francescotonini/multimodal-across-domains-gaze-target-detection.

updated: Tue Aug 23 2022 09:09:00 GMT+0000 (UTC)

published: Tue Aug 23 2022 09:09:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト