Learning to Regress Bodies from Images using Differentiable Semantic Rendering

Sai Kumar Dwivedi; Nikos Athanasiou; Muhammed Kocabas; Michael J. Black

微分可能なセマンティックレンダリングを使用して画像からボディを回帰することを学ぶ

単眼画像から3D人体の形状とポーズ（例：〜SMPLパラメータ）を回帰することを学習すると、通常、3Dトレーニングデータが利用できない場合に、2Dキーポイント、シルエット、および/またはパーツセグメンテーションの損失が利用されます。ただし、2Dキーポイントは体型を監視せず、衣服を着た人のセグメンテーションは、投影された最小限の衣服のSMPL形状と一致しないため、このような損失は限定的です。服を着た人に関するより豊富な画像情報を活用するために、服に関するより高レベルのセマンティック情報を導入して、画像の服を着た領域と服を着ていない領域に異なるペナルティを課します。そのために、新しい微分可能セマンティックレンダリング（DSR損失）を使用してボディリグレッサをトレーニングします。最小限の衣服の領域については、DSR-MC損失を定義します。これにより、レンダリングされたSMPLボディと、画像の最小限の衣服の領域との間の緊密な一致が促進されます。衣服を着た領域の場合、DSR-C損失を定義して、レンダリングされたSMPLボディが衣服マスクの内側にくるようにします。エンドツーエンドの微分可能なトレーニングを確実にするために、私たちは何千もの服を着た人間のスキャンからSMPL頂点の前に意味論的な服を学びます。 3D人間のポーズと形状の推定の精度に対する衣服のセマンティクスの役割を評価するために、広範な定性的および定量的実験を実行します。 3DPWおよびHuman3.6Mでこれまでのすべての最先端の方法を上回り、MPI-INF-3DHPで同等の結果を取得します。コードとトレーニング済みモデルは、https：//dsr.is.tue.mpg.de/で調査できます。

Learning to regress 3D human body shape and pose (e.g.~SMPL parameters) from monocular images typically exploits losses on 2D keypoints, silhouettes, and/or part-segmentation when 3D training data is not available. Such losses, however, are limited because 2D keypoints do not supervise body shape and segmentations of people in clothing do not match projected minimally-clothed SMPL shapes. To exploit richer image information about clothed people, we introduce higher-level semantic information about clothing to penalize clothed and non-clothed regions of the image differently. To do so, we train a body regressor using a novel Differentiable Semantic Rendering - DSR loss. For Minimally-Clothed regions, we define the DSR-MC loss, which encourages a tight match between a rendered SMPL body and the minimally-clothed regions of the image. For clothed regions, we define the DSR-C loss to encourage the rendered SMPL body to be inside the clothing mask. To ensure end-to-end differentiable training, we learn a semantic clothing prior for SMPL vertices from thousands of clothed human scans. We perform extensive qualitative and quantitative experiments to evaluate the role of clothing semantics on the accuracy of 3D human pose and shape estimation. We outperform all previous state-of-the-art methods on 3DPW and Human3.6M and obtain on par results on MPI-INF-3DHP. Code and trained models are available for research at https://dsr.is.tue.mpg.de/.

updated: Wed Feb 23 2022 16:02:59 GMT+0000 (UTC)

published: Thu Oct 07 2021 14:03:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト