Adversarial Motion Modelling helps Semi-supervised Hand Pose Estimation

Adrian Spurr; Pavlo Molchanov; Umar Iqbal; Jan Kautz; Otmar Hilliges

敵対的モーションモデリングは、半教師あり手のポーズ推定に役立ちます

手の形状や外観の多様性だけでなく、さまざまな環境条件、物体と自己の閉塞のために、手のポーズの推定は困難です。完全に注釈が付けられたデータセットでこの広範囲の要因を網羅的にカバーすることは非現実的であり、教師あり手法の一般化に重大な課題をもたらしています。この課題を受け入れて、敵対的なトレーニングとモーションモデリングのアイデアを組み合わせて、ラベルのない動画を活用することを提案します。この目的のために、私たちの知る限りでは、手の最初のモーションモデルを提案し、敵対的な定式化が、ラベルのないビデオシーケンスの半教師ありトレーニングを介して手のポーズ推定器のより良い一般化特性につながることを示します。この設定では、ポーズ予測子は、識別力のある敵によって決定された、有効な一連の手のポーズを生成する必要があります。この敵対者は、構造的領域と時間的領域の両方で理由を説明し、タスクで時空間構造を効果的に活用します。私たちのアプローチの主な利点は、ペアのトレーニングデータよりもはるかに簡単に取得できるペアのないビデオとジョイントシーケンスデータを利用できることです。提案されたフレームワークに必要な必須コンポーネントを調査し、提案されたアプローチがポーズ推定の精度の大幅な向上につながることを2つの困難な設定で経験的に実証し、広範な評価を実行します。最も低いラベル設定では、絶対平均ジョイントエラーが40％向上します。

Hand pose estimation is difficult due to different environmental conditions, object- and self-occlusion as well as diversity in hand shape and appearance. Exhaustively covering this wide range of factors in fully annotated datasets has remained impractical, posing significant challenges for generalization of supervised methods. Embracing this challenge, we propose to combine ideas from adversarial training and motion modelling to tap into unlabeled videos. To this end we propose what to the best of our knowledge is the first motion model for hands and show that an adversarial formulation leads to better generalization properties of the hand pose estimator via semi-supervised training on unlabeled video sequences. In this setting, the pose predictor must produce a valid sequence of hand poses, as determined by a discriminative adversary. This adversary reasons both on the structural as well as temporal domain, effectively exploiting the spatio-temporal structure in the task. The main advantage of our approach is that we can make use of unpaired videos and joint sequence data both of which are much easier to attain than paired training data. We perform extensive evaluation, investigating essential components needed for the proposed framework and empirically demonstrate in two challenging settings that the proposed approach leads to significant improvements in pose estimation accuracy. In the lowest label setting, we attain an improvement of 40% in absolute mean joint error.

updated: Thu Jun 10 2021 17:50:19 GMT+0000 (UTC)

published: Thu Jun 10 2021 17:50:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト