3D Human Pose, Shape and Texture from Low-Resolution Images and Videos

Xiangyu Xu; Hao Chen; Francesc Moreno-Noguer; Laszlo A. Jeni; Fernando De la Torre

低解像度の画像とビデオからの3D人間のポーズ、形状、テクスチャ

単眼画像からの3D人間の姿勢と形状の推定は、コンピュータビジョンの活発な研究分野です。このタスクの既存の深層学習方法は高解像度の入力に依存していますが、ビデオ監視やスポーツ放送などの多くのシナリオで常に利用できるとは限りません。低解像度の画像を処理するための2つの一般的なアプローチは、入力に超解像技術を適用することです。これにより、不快なアーティファクトが発生する可能性があります。または、解像度ごとに1つのモデルをトレーニングするだけです。これは、多くの現実的なアプリケーションでは実用的ではありません。上記の問題に対処するために、この論文では、解像度認識ネットワーク、自己監視損失、および対照学習スキームで構成されるRSC-Netと呼ばれる新しいアルゴリズムを提案します。提案された方法は、単一のモデルで異なる解像度にわたって3Dの体のポーズと形状を学習することができます。自己監視損失は出力のスケールの一貫性を強制し、対照的な学習スキームは深い特徴のスケールの一貫性を強制します。これらの新しい損失は両方とも、弱教師ありの方法で学習するときに堅牢性を提供することを示しています。さらに、RSC-Netを拡張して低解像度のビデオを処理し、それを適用して、低解像度の入力からテクスチャのある3D歩行者を再構築します。広範な実験により、RSC-Netは、低解像度の画像に挑戦するための最先端の方法よりも一貫して優れた結果を達成できることが実証されています。

3D human pose and shape estimation from monocular images has been an active research area in computer vision. Existing deep learning methods for this task rely on high-resolution input, which however, is not always available in many scenarios such as video surveillance and sports broadcasting. Two common approaches to deal with low-resolution images are applying super-resolution techniques to the input, which may result in unpleasant artifacts, or simply training one model for each resolution, which is impractical in many realistic applications. To address the above issues, this paper proposes a novel algorithm called RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme. The proposed method is able to learn 3D body pose and shape across different resolutions with one single model. The self-supervision loss enforces scale-consistency of the output, and the contrastive learning scheme enforces scale-consistency of the deep features. We show that both these new losses provide robustness when learning in a weakly-supervised manner. Moreover, we extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input. Extensive experiments demonstrate that the RSC-Net can achieve consistently better results than the state-of-the-art methods for challenging low-resolution images.

updated: Thu Mar 11 2021 06:52:12 GMT+0000 (UTC)

published: Thu Mar 11 2021 06:52:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト