A Lightweight Domain Adaptive Absolute Pose Regressor Using Barlow Twins Objective

Praveen Kumar Rajendran; Quoc-Vinh Lai-Dang; Luiz Felipe Vecchietti; Dongsoo Har

Barlow Twins オブジェクティブを使用した軽量ドメイン適応絶対ポーズリグレッサー

特定の画像のカメラポーズを特定することは、ロボット工学、自動運転車、および拡張現実/仮想現実のアプリケーションでは困難な問題です。最近、学習ベースの方法が絶対カメラ姿勢推定に有効であることが示されています。ただし、異なるドメインに一般化する場合、これらの方法は正確ではありません。この論文では、絶対姿勢回帰のためのドメイン適応トレーニングフレームワークを紹介します。提案されたフレームワークでは、Barlow Twins 目的を使用して並列ブランチをトレーニングする生成手法を使用して、さまざまなドメインのシーンイメージを拡張します。並列分岐は、軽量の CNN ベースの絶対姿勢リグレッサーアーキテクチャを活用します。さらに、回転予測のための回帰ヘッドに空間的およびチャネルごとの注意を組み込むことの有効性が調査されます。私たちの方法は、ケンブリッジランドマークと 7Scenes の 2 つのデータセットで評価されます。結果は、MS-Transformer の約 24 分の 1 の FLOP、12 分の 1 のアクティブ化、および 5 分の 1 のパラメーターを使用しても、私たちのアプローチがすべての CNN ベースのアーキテクチャよりも優れており、トランスフォーマーベースのアーキテクチャに匹敵するパフォーマンスを達成することを示しています。私たちの方法は、ケンブリッジランドマークと 7Scenes データセットでそれぞれ 2 位と 4 位にランクされています。さらに、トレーニング中に遭遇しなかった拡張ドメインの場合、私たちのアプローチは MS トランスフォーマーよりも大幅に優れています。さらに、私たちのドメイン適応フレームワークは、目に見えない分布のすべてのインスタンスを持つ同一の CNN バックボーンでトレーニングされた単一ブランチモデルよりも優れたパフォーマンスを達成することが示されています。

Identifying the camera pose for a given image is a challenging problem with applications in robotics, autonomous vehicles, and augmented/virtual reality. Lately, learning-based methods have shown to be effective for absolute camera pose estimation. However, these methods are not accurate when generalizing to different domains. In this paper, a domain adaptive training framework for absolute pose regression is introduced. In the proposed framework, the scene image is augmented for different domains by using generative methods to train parallel branches using Barlow Twins objective. The parallel branches leverage a lightweight CNN-based absolute pose regressor architecture. Further, the efficacy of incorporating spatial and channel-wise attention in the regression head for rotation prediction is investigated. Our method is evaluated with two datasets, Cambridge landmarks and 7Scenes. The results demonstrate that, even with using roughly 24 times fewer FLOPs, 12 times fewer activations, and 5 times fewer parameters than MS-Transformer, our approach outperforms all the CNN-based architectures and achieves performance comparable to transformer-based architectures. Our method ranks 2nd and 4th with the Cambridge Landmarks and 7Scenes datasets, respectively. In addition, for augmented domains not encountered during training, our approach significantly outperforms the MS-transformer. Furthermore, it is shown that our domain adaptive framework achieves better performance than the single branch model trained with the identical CNN backbone with all instances of the unseen distribution.

updated: Sun Nov 20 2022 12:18:53 GMT+0000 (UTC)

published: Sun Nov 20 2022 12:18:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト