NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation

Angtian Wang; Adam Kortylewski; Alan Yuille

NeMo：ロバストな3Dポーズ推定のための対照的な特徴のニューラルメッシュモデル

3Dポーズ推定は、コンピュータビジョンにおいて挑戦的ですが、重要なタスクです。この作業では、3Dポーズ推定への標準的な深層学習アプローチは、オブジェクトが部分的に遮られているか、以前は見えなかったポーズから表示されている場合、堅牢ではないことを示します。部分的閉塞に対する生成的ビジョンモデルの堅牢性に触発されて、深層ニューラルネットワークをオブジェクトの3D生成的表現と統合して、NeMoと呼ばれる統合ニューラルアーキテクチャにすることを提案します。特に、NeMoは、密な3Dメッシュ上の各頂点での神経機能の活性化の生成モデルを学習します。微分可能なレンダリングを使用して、NeMoとターゲット画像の特徴表現との間の再構成エラーを最小化することにより、3Dオブジェクトのポーズを推定します。再構成損失の局所的な最適化を回避するために、対照学習を使用してメッシュ上の個々の特徴表現間の距離を最大化するように特徴抽出器をトレーニングします。 PASCAL3D +、occluded-PASCAL3D +、およびObjectNet3Dに関する広範な実験では、NeMoは、通常のデータで競争力のあるパフォーマンスを維持しながら、標準のディープネットワークと比較して部分的なオクルージョンや見えないポーズに対してはるかに堅牢であることが示されています。興味深いことに、私たちの実験では、メッシュ表現が直方体で実際のオブジェクトジオメトリを大まかに近似している場合でも、NeMoが適度に良好に機能することも示されています。したがって、正確な3Dポーズ推定には詳細な3Dジオメトリは必要ありません。コードはhttps://github.com/Angtian/NeMoで公開されています。

3D pose estimation is a challenging but important task in computer vision. In this work, we show that standard deep learning approaches to 3D pose estimation are not robust when objects are partially occluded or viewed from a previously unseen pose. Inspired by the robustness of generative vision models to partial occlusion, we propose to integrate deep neural networks with 3D generative representations of objects into a unified neural architecture that we term NeMo. In particular, NeMo learns a generative model of neural feature activations at each vertex on a dense 3D mesh. Using differentiable rendering we estimate the 3D object pose by minimizing the reconstruction error between NeMo and the feature representation of the target image. To avoid local optima in the reconstruction loss, we train the feature extractor to maximize the distance between the individual feature representations on the mesh using contrastive learning. Our extensive experiments on PASCAL3D+, occluded-PASCAL3D+ and ObjectNet3D show that NeMo is much more robust to partial occlusion and unseen pose compared to standard deep networks, while retaining competitive performance on regular data. Interestingly, our experiments also show that NeMo performs reasonably well even when the mesh representation only crudely approximates the true object geometry with a cuboid, hence revealing that the detailed 3D geometry is not needed for accurate 3D pose estimation. The code is publicly available at https://github.com/Angtian/NeMo.

updated: Fri Jan 29 2021 03:23:12 GMT+0000 (UTC)

published: Fri Jan 29 2021 03:23:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト