Fine-Grained 3D Shape Classification with Hierarchical Part-View Attentions

Xinhai Liu; Zhizhong Han; Yu-Shen Liu; Matthias Zwicker

階層的な部分ビューに注意を払った細粒度の3D形状分類

きめ細かい3D形状分類は、形状の理解と分析にとって重要であり、これは困難な研究問題を引き起こします。ただし、きめの細かい3D形状のベンチマークがないため、きめの細かい3D形状の分類に関する研究はほとんど検討されていません。この問題に対処するために、まず、飛行機、車、椅子の3つのカテゴリで構成される、きめ細かいクラスラベルを使用した新しい3D形状データセット（FG3Dデータセットという名前）を紹介します。各カテゴリは、きめ細かいレベルのいくつかのサブカテゴリで構成されています。このきめ細かいデータセットでの実験によると、最先端の方法は、同じカテゴリのサブカテゴリ間のわずかな差異によって大幅に制限されていることがわかりました。この問題を解決するために、FG3D-Netという名前の新しいきめの細かい3D形状分類方法を提案し、複数のレンダリングされたビューから3D形状のきめの細かい局所的な詳細をキャプチャします。具体的には、まず、Region Proposal Network（RPN）をトレーニングして、一般的なセマンティックパーツ検出のベンチマークの下で、複数のビュー内の一般的なセマンティックパーツを検出します。次に、階層的なパーツビュー注意集約モジュールを設計して、一般的にセマンティックなパーツフィーチャを集約することにより、グローバルな形状表現を学習します。これにより、3D形状のローカルな詳細が保持されます。パートビューアテンションモジュールは、パートレベルおよびビューレベルのアテンションを階層的に活用して、機能の識別性を高めます。パーツレベルの注意は、各ビューの重要な部分を強調し、ビューレベルの注意は、同じオブジェクトのすべてのビューの中で識別可能なビューを強調します。さらに、リカレントニューラルネットワーク（RNN）を統合して、さまざまな視点からのシーケンシャルビュー間の空間的関係をキャプチャします。きめ細かい3D形状データセットの下での結果は、私たちの方法が他の最先端の方法よりも優れていることを示しています。

Fine-grained 3D shape classification is important for shape understanding and analysis, which poses a challenging research problem. However, the studies on the fine-grained 3D shape classification have rarely been explored, due to the lack of fine-grained 3D shape benchmarks. To address this issue, we first introduce a new 3D shape dataset (named FG3D dataset) with fine-grained class labels, which consists of three categories including airplane, car and chair. Each category consists of several subcategories at a fine-grained level. According to our experiments under this fine-grained dataset, we find that state-of-the-art methods are significantly limited by the small variance among subcategories in the same category. To resolve this problem, we further propose a novel fine-grained 3D shape classification method named FG3D-Net to capture the fine-grained local details of 3D shapes from multiple rendered views. Specifically, we first train a Region Proposal Network (RPN) to detect the generally semantic parts inside multiple views under the benchmark of generally semantic part detection. Then, we design a hierarchical part-view attention aggregation module to learn a global shape representation by aggregating generally semantic part features, which preserves the local details of 3D shapes. The part-view attention module hierarchically leverages part-level and view-level attention to increase the discriminability of our features. The part-level attention highlights the important parts in each view while the view-level attention highlights the discriminative views among all the views of the same object. In addition, we integrate a Recurrent Neural Network (RNN) to capture the spatial relationships among sequential views from different viewpoints. Our results under the fine-grained 3D shape dataset show that our method outperforms other state-of-the-art methods.

updated: Mon Dec 28 2020 06:34:39 GMT+0000 (UTC)

published: Tue May 26 2020 06:53:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト