Share With Thy Neighbors: Single-View Reconstruction by Cross-Instance Consistency

Tom Monnier; Matthew Fisher; Alexei A. Efros; Mathieu Aubry

あなたの隣人と共有する：クロスインスタンスの一貫性による単一ビューの再構築

シングルビュー再構成のアプローチは、通常、視点の注釈、シルエット、背景がないこと、同じインスタンスの複数のビュー、テンプレートの形状、または対称性に依存します。異なるオブジェクトインスタンスの画像間の一貫性を明示的に活用することにより、このような監視や仮定をすべて回避します。その結果、私たちの方法は、同じオブジェクトカテゴリを表すラベルのない画像の大規模なコレクションから学習できます。私たちの主な貢献は、インスタンス間の一貫性を活用するための2つの方法です。（i）プログレッシブコンディショニング、カリキュラム学習方式でモデルをカテゴリからインスタンスに徐々に特殊化するトレーニング戦略。（ii）隣接する再構築、類似した形状またはテクスチャを持つインスタンス間の一貫性を強制する損失。また、この方法を成功させるために重要なのは、画像を明示的な形状、テクスチャ、ポーズ、背景に分解する構造化された自動エンコードアーキテクチャです。差分レンダリングの適応された定式化。 3Dとポーズ学習を交互に繰り返す新しい最適化スキーム。多様な合成ShapeNetデータセット（監視として複数のビューを必要とするメソッドの古典的なベンチマーク）と、ほとんどのメソッドが既知のテンプレートとシルエットアノテーションを必要とする標準の実像ベンチマーク（Pascal3D + Car、CUB）の両方で、私たちのアプローチUNICORNを比較します。また、シルエットが利用できず、オブジェクトの周囲に画像がトリミングされていない、より挑戦的な現実世界のコレクション（CompCars、LSUN）への適用性も示します。

Approaches for single-view reconstruction typically rely on viewpoint annotations, silhouettes, the absence of background, multiple views of the same instance, a template shape, or symmetry. We avoid all such supervision and assumptions by explicitly leveraging the consistency between images of different object instances. As a result, our method can learn from large collections of unlabelled images depicting the same object category. Our main contributions are two ways for leveraging cross-instance consistency: (i) progressive conditioning, a training strategy to gradually specialize the model from category to instances in a curriculum learning fashion; and (ii) neighbor reconstruction, a loss enforcing consistency between instances having similar shape or texture. Also critical to the success of our method are: our structured autoencoding architecture decomposing an image into explicit shape, texture, pose, and background; an adapted formulation of differential rendering; and a new optimization scheme alternating between 3D and pose learning. We compare our approach, UNICORN, both on the diverse synthetic ShapeNet dataset - the classical benchmark for methods requiring multiple views as supervision - and on standard real-image benchmarks (Pascal3D+ Car, CUB) for which most methods require known templates and silhouette annotations. We also showcase applicability to more challenging real-world collections (CompCars, LSUN), where silhouettes are not available and images are not cropped around the object.

updated: Mon Jul 25 2022 07:57:02 GMT+0000 (UTC)

published: Thu Apr 21 2022 17:47:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト