PoseBERT: A Generic Transformer Module for Temporal 3D Human Modeling

Fabien Baradel; Romain Brégier; Thibault Groueix; Philippe Weinzaepfel; Yannis Kalantidis; Grégory Rogez

PoseBERT: テンポラル 3D ヒューマンモデリング用のジェネリックトランスフォーマーモジュール

ビデオで人間の姿勢を推定するための最先端のモデルをトレーニングするには、注釈付きのデータセットが必要ですが、これを取得するのは非常に困難で費用がかかります。トランスフォーマーは最近、ボディポーズシーケンスモデリングに利用されていますが、関連する方法は疑似グラウンドトゥルースに依存して、現在そのようなモデルの学習に利用できる限られたトレーニングデータを補強しています。このホワイトペーパーでは、マスクモデリングを介して 3D モーションキャプチャ (MoCap) データで完全にトレーニングされた変換モジュールである PoseBERT を紹介します。画像ベースのモデルの上にプラグインして、一時的な情報を利用してビデオベースのモデルに変換できるため、シンプルで汎用的で用途が広いです。 3D スケルトンのキーポイントから全身 (SMPL) または手のみ (MANO) の 3D パラメトリックモデルの回転まで、さまざまな入力を持つ PoseBERT のバリエーションを紹介します。 PoseBERT トレーニングはタスクにとらわれないため、ポーズの改良、将来のポーズの予測、モーションの完成などのいくつかのタスクに微調整なしでモデルを適用できます。私たちの実験結果は、さまざまな最先端の姿勢推定方法の上に PoseBERT を追加することで、一貫してパフォーマンスが向上することを検証しています。また、計算コストが低いため、リアルタイムのデモで使用して、ロボットハンドをスムーズにアニメーション化することができます。ウェブカメラ。テストコードとモデルは、https://github.com/naver/posebert で入手できます。

Training state-of-the-art models for human pose estimation in videos requires datasets with annotations that are really hard and expensive to obtain. Although transformers have been recently utilized for body pose sequence modeling, related methods rely on pseudo-ground truth to augment the currently limited training data available for learning such models. In this paper, we introduce PoseBERT, a transformer module that is fully trained on 3D Motion Capture (MoCap) data via masked modeling. It is simple, generic and versatile, as it can be plugged on top of any image-based model to transform it in a video-based model leveraging temporal information. We showcase variants of PoseBERT with different inputs varying from 3D skeleton keypoints to rotations of a 3D parametric model for either the full body (SMPL) or just the hands (MANO). Since PoseBERT training is task agnostic, the model can be applied to several tasks such as pose refinement, future pose prediction or motion completion without finetuning. Our experimental results validate that adding PoseBERT on top of various state-of-the-art pose estimation methods consistently improves their performances, while its low computational cost allows us to use it in a real-time demo for smoothly animating a robotic hand via a webcam. Test code and models are available at https://github.com/naver/posebert.

updated: Mon Aug 22 2022 11:30:14 GMT+0000 (UTC)

published: Mon Aug 22 2022 11:30:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト