Learning Compositional Shape Priors for Few-Shot 3D Reconstruction

Mateusz Michalkiewicz; Stavros Tsogkas; Sarah Parisot; Mahsa Baktashmotlagh; Anders Eriksson; Eugene Belilovsky

少数ショットの3D再構成のための構成形状事前分布の学習

シングルビュー3D再構成における深い畳み込みニューラルネットワークの印象的なパフォーマンスは、これらのモデルが出力空間の3D構造について自明ではない推論を実行することを示唆しています。最近の研究はこの信念に異議を唱え、標準的なベンチマークでは、複雑なエンコーダ-デコーダアーキテクチャが、カテゴリごとの大量のデータを活用する最近傍ベースラインまたは単純な線形デコーダモデルと同様に機能することを示しています。ただし、教師ありトレーニング用に3D形状の大規模なコレクションを構築することは、骨の折れるプロセスです。より現実的で制約の少ないタスクは、利用可能なトレーニング例がほとんどないカテゴリの3D形状を推測し、新しいオブジェクトクラスに正常に一般化できるモデルを要求することです。この作業では、ネットワークが新しいカテゴリの推論のために有益な形状の事前確率を学習する必要があるこの数ショットの学習設定では、ナイーブなベースラインが失敗することを実験的に示します。データから直接、クラス固有のグローバル形状を事前に学習する3つの方法を提案します。これらの手法を使用して、3D形状に関するマルチスケール情報をキャプチャし、暗黙の構成構造によってクラス内の変動を説明することができます。人気のあるShapeNetデータセットでの実験では、数ショットの設定で、相対的なパフォーマンスの点で、私たちの方法がゼロショットベースラインを40％以上上回り、現在の最先端技術を10％以上上回っていることを示しています。

The impressive performance of deep convolutional neural networks in single-view 3D reconstruction suggests that these models perform non-trivial reasoning about the 3D structure of the output space. Recent work has challenged this belief, showing that, on standard benchmarks, complex encoder-decoder architectures perform similarly to nearest-neighbor baselines or simple linear decoder models that exploit large amounts of per-category data. However, building large collections of 3D shapes for supervised training is a laborious process; a more realistic and less constraining task is inferring 3D shapes for categories with few available training examples, calling for a model that can successfully generalize to novel object classes. In this work we experimentally demonstrate that naive baselines fail in this few-shot learning setting, in which the network must learn informative shape priors for inference of new categories. We propose three ways to learn a class-specific global shape prior, directly from data. Using these techniques, we are able to capture multi-scale information about the 3D shape, and account for intra-class variability by virtue of an implicit compositional structure. Experiments on the popular ShapeNet dataset show that our method outperforms a zero-shot baseline by over 40%, and the current state-of-the-art by over 10%, in terms of relative performance, in the few-shot setting.

updated: Wed Jun 16 2021 11:18:32 GMT+0000 (UTC)

published: Fri Jun 11 2021 14:55:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト