Investigating the Nature of 3D Generalization in Deep Neural Networks

Shoaib Ahmed Siddiqui; David Krueger; Thomas Breuel

深層ニューラルネットワークにおける 3D 汎化の性質の調査

視覚オブジェクト認識システムは、一連の 2D トレーニングビューから新しいビューに一般化する必要があります。人間の視覚系がどのように新しいビューに一般化できるかという問題は、心理学、コンピュータービジョン、および神経科学で研究され、モデル化されてきました。オブジェクト認識のための最新のディープラーニングアーキテクチャは、斬新なビューにうまく一般化されていますが、そのメカニズムはよく理解されていません。このホワイトペーパーでは、一般的なディープラーニングアーキテクチャが新しいビューに一般化する能力を特徴付けます。これを、ラベルが一意の 3D オブジェクトに対応し、例が異なる 3D 方向にあるオブジェクトの 2D ビューに対応する教師付き分類タスクとして定式化します。新しいビューへの一般化の 3 つの一般的なモデルを検討します。(i) 完全な 3D 一般化、(ii) 純粋な 2D マッチング、および (iii) ビューの線形結合に基づくマッチング。ディープモデルは斬新なビューにうまく一般化できますが、既存のすべてのモデルとは異なる方法で一般化されます。トレーニングセット内のビューでカバーされる範囲を超えるビューへの外挿は制限されており、新しい回転軸への外挿はさらに制限されています。これは、ネットワークが完全な 3D 構造を推測せず、線形補間も使用しないことを意味します。それでも、一般化は純粋な 2D マッチングよりもはるかに優れています。これらの調査結果は、3D の一般化を実現するために必要な 2D ビューを使用してデータセットを設計するのに役立ちます。実験を再現するコードは公開されています: https://github.com/shoaibahmed/investigating_3d_generalization.git

Visual object recognition systems need to generalize from a set of 2D training views to novel views. The question of how the human visual system can generalize to novel views has been studied and modeled in psychology, computer vision, and neuroscience. Modern deep learning architectures for object recognition generalize well to novel views, but the mechanisms are not well understood. In this paper, we characterize the ability of common deep learning architectures to generalize to novel views. We formulate this as a supervised classification task where labels correspond to unique 3D objects and examples correspond to 2D views of the objects at different 3D orientations. We consider three common models of generalization to novel views: (i) full 3D generalization, (ii) pure 2D matching, and (iii) matching based on a linear combination of views. We find that deep models generalize well to novel views, but they do so in a way that differs from all these existing models. Extrapolation to views beyond the range covered by views in the training set is limited, and extrapolation to novel rotation axes is even more limited, implying that the networks do not infer full 3D structure, nor use linear interpolation. Yet, generalization is far superior to pure 2D matching. These findings help with designing datasets with 2D views required to achieve 3D generalization. Code to reproduce our experiments is publicly available: https://github.com/shoaibahmed/investigating_3d_generalization.git

updated: Wed Apr 19 2023 00:54:00 GMT+0000 (UTC)

published: Wed Apr 19 2023 00:54:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト