BPT: Binary Point Cloud Transformer for Place Recognition

Zhixing Hou; Yuzhang Shang; Tian Gao; Yan Yan

BPT: 場所認識用のバイナリポイントクラウドトランスフォーマー

再訪問した場所を認識するアルゴリズムである場所認識は、完全な SLAM システムでバックエンド最適化トリガーの役割を果たします。 MLP、CNN、Transformer などの深層学習ツールを搭載した多くの研究は、この研究分野で大きな進歩を遂げています。ポイントクラウドトランスフォーマーは、ロボット工学に適用される場所認識の優れたフレームワークの 1 つですが、大量のメモリ消費と高価な計算を伴うため、さまざまなポイントクラウドトランスフォーマーネットワークをモバイルデバイスまたは組み込みデバイスに広く展開することは困難です。この問題を解決するために、場所認識用のバイナリポイントクラウドトランスフォーマーを提案します。その結果、32 ビットの完全精度モデルを 1 ビットモデルに縮小することができ、メモリ占有量が少なくなり、バイナリ化されたビット単位の操作が高速になります。私たちの知る限り、これは場所認識などのオンラインアプリケーション用にモバイルデバイスに展開できる最初のバイナリポイントクラウドトランスフォーマーです。いくつかの標準的なベンチマークでの実験は、提案された方法が対応する完全精度の変換モデルと同等の結果を得ることができ、一部の完全精度の深層学習方法よりも優れていることさえあることを示しています。たとえば、提案された方法は、平均再現率のメトリックに関して、Oxford RobotCar データセットの上位 @1% で 93.28%、上位 @1% で 85.74% を達成します。一方、同じ変換構造を持つモデルのサイズと浮動小数点演算は、元の精度からバイナリ精度までそれぞれ 56.1% と 34.1% 減少します。

Place recognition, an algorithm to recognize the re-visited places, plays the role of back-end optimization trigger in a full SLAM system. Many works equipped with deep learning tools, such as MLP, CNN, and transformer, have achieved great improvements in this research field. Point cloud transformer is one of the excellent frameworks for place recognition applied in robotics, but with large memory consumption and expensive computation, it is adverse to widely deploy the various point cloud transformer networks in mobile or embedded devices. To solve this issue, we propose a binary point cloud transformer for place recognition. As a result, a 32-bit full-precision model can be reduced to a 1-bit model with less memory occupation and faster binarized bitwise operations. To our best knowledge, this is the first binary point cloud transformer that can be deployed on mobile devices for online applications such as place recognition. Experiments on several standard benchmarks demonstrate that the proposed method can get comparable results with the corresponding full-precision transformer model and even outperform some full-precision deep learning methods. For example, the proposed method achieves 93.28% at the top @1% and 85.74% at the top @1% on the Oxford RobotCar dataset in terms of the metric of the average recall rate. Meanwhile, the size and floating point operations of the model with the same transformer structure reduce 56.1% and 34.1% respectively from original precision to binary precision.

updated: Thu Mar 02 2023 11:15:59 GMT+0000 (UTC)

published: Thu Mar 02 2023 11:15:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト