Efficient Linear Attention for Fast and Accurate Keypoint Matching

Suwichaya Suwanwimolkul; Satoshi Komorita

高速で正確なキーポイントマッチングのための効率的な線形注意

最近、トランスフォーマーは、高性能3Dビジョンアプリケーションを実現するために不可欠な、スパースマッチングで最先端のパフォーマンスを提供しています。しかし、これらのトランスフォーマーは、注意メカニズムの2次計算の複雑さのために、効率に欠けています。この問題を解決するために、線形計算の複雑さに対して効率的な線形注意を採用します。次に、スパースキーポイントからグローバル情報とローカル情報の両方を集約することにより、高精度を実現する新しい注意集約を提案します。効率をさらに向上させるために、特徴のマッチングと記述の共同学習を提案します。私たちの学習は、トランスフォーマーから学習した記述子のマッチングによく使用されるSinkhornよりも簡単で高速なマッチングを可能にします。私たちの方法は、HPatch、ETH、およびAachen Day-Nightの3つのベンチマークで、より大きなSOTA、SuperGlue（12Mパラメーター）およびSGMNet（30Mパラメーター）に対してわずか0.84Mの学習可能なパラメーターで競争力のあるパフォーマンスを実現します。

Recently Transformers have provided state-of-the-art performance in sparse matching, crucial to realize high-performance 3D vision applications. Yet, these Transformers lack efficiency due to the quadratic computational complexity of their attention mechanism. To solve this problem, we employ an efficient linear attention for the linear computational complexity. Then, we propose a new attentional aggregation that achieves high accuracy by aggregating both the global and local information from sparse keypoints. To further improve the efficiency, we propose the joint learning of feature matching and description. Our learning enables simpler and faster matching than Sinkhorn, often used in matching the learned descriptors from Transformers. Our method achieves competitive performance with only 0.84M learnable parameters against the bigger SOTAs, SuperGlue (12M parameters) and SGMNet (30M parameters), on three benchmarks, HPatch, ETH, and Aachen Day-Night.

updated: Fri Apr 22 2022 16:16:26 GMT+0000 (UTC)

published: Sat Apr 16 2022 06:17:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト