R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition

Shuo Chen; Tan Yu; Ping Li

R2-MLP: マルチビュー 3D オブジェクト認識のためのラウンドロール MLP

最近、多層パーセプトロン (MLP) のみに基づくビジョンアーキテクチャが、コンピュータービジョンコミュニティで大きな注目を集めています。 MLP のようなモデルは、手動で作成された畳み込みレイヤーを使用せずに、帰納的バイアスの少ない単一の 2D 画像分類で競争力のあるパフォーマンスを実現します。この作業では、ビューベースの 3D オブジェクト認識タスクに対する MLP ベースのアーキテクチャの有効性を調査します。 Round-Roll MLP (R^2-MLP) と呼ばれる MLP ベースのアーキテクチャを紹介します。異なるビューからのパッチ間の通信を考慮することにより、空間シフト MLP バックボーンを拡張します。 R^2-MLP は、ビューディメンションに沿ってチャネルの一部をロールし、隣接するビュー間の情報交換を促進します。 ModelNet10 および ModelNet40 データセットで MLP の結果をベンチマークし、さまざまな側面でアブレーションを行います。実験結果は、概念的に単純な構造で、R^2-MLP が既存の最先端の方法と比較して競争力のあるパフォーマンスを達成することを示しています。

Recently, vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community. MLP-like models achieve competitive performance on a single 2D image classification with less inductive bias without hand-crafted convolution layers. In this work, we explore the effectiveness of MLP-based architecture for the view-based 3D object recognition task. We present an MLP-based architecture termed as Round-Roll MLP (R^2-MLP). It extends the spatial-shift MLP backbone by considering the communications between patches from different views. R^2-MLP rolls part of the channels along the view dimension and promotes information exchange between neighboring views. We benchmark MLP results on ModelNet10 and ModelNet40 datasets with ablations in various aspects. The experimental results show that, with a conceptually simple structure, our R^2-MLP achieves competitive performance compared with existing state-of-the-art methods.

updated: Sun Nov 20 2022 21:13:02 GMT+0000 (UTC)

published: Sun Nov 20 2022 21:13:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト