VNT-Net: Rotational Invariant Vector Neuron Transformers

Hedi Zisling; Andrei Sharf

VNT-Net：回転不変ベクトルニューロントランスフォーマー

回転不変性を使用して3Dポイントセットを学習することは、機械学習において重要で困難な問題です。回転不変アーキテクチャにより、3Dポイントクラウドニューラルネットワークは、標準的なグローバルポーズを必要とせず、すべての可能な回転による徹底的なデータ拡張から解放されます。この作業では、最近導入されたベクトルニューロンを自己注意層と組み合わせて点群ベクトルニューロントランスフォーマーネットワーク（VNT-Net）を構築することにより、回転不変ニューラルネットワークを導入します。ベクトルニューロンは、SO（3）アクションを表現する際の単純さと汎用性で知られており、それによって一般的な神経操作に組み込まれます。同様に、Transformerアーキテクチャは人気を博しており、最近、画像パッチのシーケンスに直接適用し、優れたパフォーマンスと収束を実現することで、画像に成功することが示されました。両方の世界から利益を得るには、主にベクトルニューロンの動作に準拠するように多頭注意層を適応させる方法を示すことにより、2つの構造を組み合わせます。この適応により、注意層はSO（3）になり、ネットワーク全体が回転不変性になります。実験は、私たちのネットワークが任意のポーズで3D点群オブジェクトを効率的に処理することを示しています。また、関連する最先端の方法と比較した場合、ネットワークがより高い精度を達成し、一般的な分類およびセグメンテーションタスクでのハイパーパラメータの数が少ないため、必要なトレーニングが少ないことも示しています。

Learning 3D point sets with rotational invariance is an important and challenging problem in machine learning. Through rotational invariant architectures, 3D point cloud neural networks are relieved from requiring a canonical global pose and from exhaustive data augmentation with all possible rotations. In this work, we introduce a rotational invariant neural network by combining recently introduced vector neurons with self-attention layers to build a point cloud vector neuron transformer network (VNT-Net). Vector neurons are known for their simplicity and versatility in representing SO(3) actions and are thereby incorporated in common neural operations. Similarly, Transformer architectures have gained popularity and recently were shown successful for images by applying directly on sequences of image patches and achieving superior performance and convergence. In order to benefit from both worlds, we combine the two structures by mainly showing how to adapt the multi-headed attention layers to comply with vector neurons operations. Through this adaptation attention layers become SO(3) and the overall network becomes rotational invariant. Experiments demonstrate that our network efficiently handles 3D point cloud objects in arbitrary poses. We also show that our network achieves higher accuracy when compared to related state-of-the-art methods and requires less training due to a smaller number of hyperparameters in common classification and segmentation tasks.

updated: Thu May 19 2022 16:51:56 GMT+0000 (UTC)

published: Thu May 19 2022 16:51:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト