Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations

Xiang Gao; Wei Hu; Guo-Jun Qi

自動エンコーディング3D変換による自己教師ありマルチビュー学習

3Dオブジェクト表現学習は、3Dの世界について推測するためのコンピュータービジョンの基本的な課題です。ディープラーニングの最近の進歩により、3Dオブジェクト認識の効率が示されました。その中で、ビューベースの方法がこれまでで最高のパフォーマンスを発揮しました。ただし、既存の方法での複数のビューの特徴学習は、ほとんどの場合、教師ありの方法で実行されます。これには、多くの場合、高コストで大量のデータラベルが必要になります。対照的に、自己教師あり学習は、ラベル付けされたデータを使用せずにマルチビューの特徴表現を学習することを目的としています。この目的のために、3Dオブジェクトとその投影された複数のビューの同変変換を探索して、マルチビュー変換同変表現（MV-TER）を学習するための新しい自己教師ありパラダイムを提案します。具体的には、3Dオブジェクトに対して3D変換を実行し、投影による変換の前後に複数のビューを取得します。次に、変換の前後の複数のビューの融合された特徴表現から3D変換パラメーターをデコードすることにより、表現を自己トレーニングして、固有の3Dオブジェクト表現をキャプチャします。実験結果は、提案されたMV-TERが、3Dオブジェクトの分類および検索タスクにおいて、最先端のビューベースのアプローチを大幅に上回っていることを示しており、実際のデータセットへの一般化を示しています。

3D object representation learning is a fundamental challenge in computer vision to infer about the 3D world. Recent advances in deep learning have shown their efficiency in 3D object recognition, among which view-based methods have performed best so far. However, feature learning of multiple views in existing methods is mostly performed in a supervised fashion, which often requires a large amount of data labels with high costs. In contrast, self-supervised learning aims to learn multi-view feature representations without involving labeled data. To this end, we propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER), exploring the equivariant transformations of a 3D object and its projected multiple views. Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via projection. Then, we self-train a representation to capture the intrinsic 3D object representation by decoding 3D transformation parameters from the fused feature representations of multiple views before and after the transformation. Experimental results demonstrate that the proposed MV-TER significantly outperforms the state-of-the-art view-based approaches in 3D object classification and retrieval tasks, and show the generalization to real-world datasets.

updated: Mon Mar 01 2021 06:24:17 GMT+0000 (UTC)

published: Mon Mar 01 2021 06:24:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト