Generative Adversarial Network for Future Hand Segmentation from Egocentric Video

Wenqi Jia; Miao Liu; James M. Rehg

自己中心的なビデオからの将来の手のセグメンテーションのための生成的敵対的ネットワーク

自己中心的なビデオから将来のハンドマスクの時系列を予測するという新しい問題を紹介します。重要な課題は、将来の頭の動きの確率をモデル化することです。これは、頭に装着したカメラのビデオ分析にグローバルに影響を与えます。この目的のために、新しい深層生成モデルを提案します。EgoGANは、3D完全畳み込みネットワークを使用して、ピクセル単位の視覚的予測のための時空間ビデオ表現を学習し、生成的敵対的ネットワーク（GAN）を使用して将来の頭の動きを生成します。次に、ビデオ表現と生成された将来の頭の動きに基づいて、将来のハンドマスクを予測します。 EPIC-KitchensとEGTEAGaze+データセットの両方でメソッドを評価します。私たちは、私たちのアプローチの設計上の選択を検証するために、詳細なアブレーション研究を実施します。さらに、私たちの方法を将来の画像セグメンテーションに関する以前の最先端の方法と比較し、私たちの方法が将来のハンドマスクをより正確に予測できることを示します。

We introduce the novel problem of anticipating a time series of future hand masks from egocentric video. A key challenge is to model the stochasticity of future head motions, which globally impact the head-worn camera video analysis. To this end, we propose a novel deep generative model -- EgoGAN, which uses a 3D Fully Convolutional Network to learn a spatio-temporal video representation for pixel-wise visual anticipation, generates future head motion using Generative Adversarial Network (GAN), and then predicts the future hand masks based on the video representation and the generated future head motion. We evaluate our method on both the EPIC-Kitchens and the EGTEA Gaze+ datasets. We conduct detailed ablation studies to validate the design choices of our approach. Furthermore, we compare our method with previous state-of-the-art methods on future image segmentation and show that our method can more accurately predict future hand masks.

updated: Mon Mar 21 2022 19:41:44 GMT+0000 (UTC)

published: Mon Mar 21 2022 19:41:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト