NeuralAnnot: Neural Annotator for in-the-wild Expressive 3D Human Pose and Mesh Training Sets

Gyeongsik Moon; Kyoung Mu Lee

NeuralAnnot：野生の表現力豊かな3D人間のポーズとメッシュのトレーニングセット用のニューラルアノテーター

トレーニングデータがないため、野生の画像から表現力豊かな3D人間のポーズとメッシュを復元することは非常に困難です。いくつかの最適化ベースの方法を使用して、GT 2Dポーズから疑似グラウンドトゥルース（GT）3Dポーズとメッシュを取得しました。ただし、フレームワークは2D監視のみを順次使用して各サンプルで最適化されるため、実行時間が長い悪いものが生成されることがよくあります。制限を克服するために、NeuralAnnotを紹介します。これは、野生の表現力豊かな3D人間のポーズとメッシュのトレーニングセットの構築を学習するニューラルアノテーターです。 NeuralAnnotは、ターゲットのインザワイルドデータセットからの2D監視と、GT3Dポーズを使用した補助データセットからの3D監視によって多数のサンプルでトレーニングされます。 NeuralAnnotは、最適化ベースの方法よりもはるかに短い実行時間ではるかに優れた3D疑似GTを生成し、新しく取得したトレーニングセットによってパフォーマンスが大幅に向上することを示します。新しく取得したトレーニングセットとコードは公開されます。

Recovering expressive 3D human pose and mesh from in-the-wild images is greatly challenging due to the absence of the training data. Several optimization-based methods have been used to obtain pseudo-groundtruth (GT) 3D poses and meshes from GT 2D poses. However, they often produce bad ones with long running time because their frameworks are optimized on each sample only using 2D supervisions in a sequential way. To overcome the limitations, we present NeuralAnnot, a neural annotator that learns to construct in-the-wild expressive 3D human pose and mesh training sets. Our NeuralAnnot is trained on a large number of samples by 2D supervisions from a target in-the-wild dataset and 3D supervisions from auxiliary datasets with GT 3D poses in a parallel way. We show that our NeuralAnnot produces far better 3D pseudo-GTs with much shorter running time than the optimization-based methods, and the newly obtained training set brings great performance gain. The newly obtained training sets and codes will be publicly available.

updated: Sat Nov 28 2020 03:32:19 GMT+0000 (UTC)

published: Mon Nov 23 2020 06:33:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト