Domain-invariant Similarity Activation Map Metric Learning for Retrieval-based Long-term Visual Localization

Hanjiang Hu; Hesheng Wang; Zhe Liu; Weidong Chen

検索ベースの長期視覚的ローカリゼーションのためのドメイン不変類似性アクティベーションマップメトリック学習

視覚的なローカリゼーションは、移動ロボットと自動運転のアプリケーションにおける重要なコンポーネントです。画像検索は、画像ベースのローカリゼーション手法における効率的かつ効果的な手法です。照明、季節、天候の変化など、環境条件の大幅な変動により、検索ベースの視覚的ローカリゼーションは深刻な影響を受け、困難な問題になります。この作業では、最初に一般的なアーキテクチャを確率的に定式化して、マルチドメイン画像変換を通じてドメイン不変の特徴を抽出します。そして、新しい勾配加重類似性活性化マッピング損失（Grad-SAM）が組み込まれ、高精度でより細かい位置特定が可能になります。また、自己監視方式で埋め込みのメトリック学習を強化するために、新しい適応トリプレット損失を提案します。最終的な粗い画像から細かい画像への検索パイプラインは、Grad-SAM損失のないモデルとあるモデルの順次の組み合わせとして実装されます。 CMU-Seasonsデータセットで提案されたアプローチの有効性を検証するために、広範な実験が実施されました。私たちのアプローチの強力な一般化能力は、CMU-Seasonsデータセットの都市部で事前トレーニングされたモデルを使用してRobotCarデータセットで検証されます。私たちのパフォーマンスは、特に照明の変化、植生、夜間の画像などの困難な環境下で、中または高精度で最先端の画像ベースのローカリゼーションベースラインと同等またはそれを上回っています。さらに、ローカリゼーションのための粗いものから細かいものへの戦略の効率と有効性を検証するために、実際の実験が実施されました。

Visual localization is a crucial component in the application of mobile robot and autonomous driving. Image retrieval is an efficient and effective technique in image-based localization methods. Due to the drastic variability of environmental conditions, e.g. illumination, seasonal and weather changes, retrieval-based visual localization is severely affected and becomes a challenging problem. In this work, a general architecture is first formulated probabilistically to extract domain-invariant feature through multi-domain image translation. And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy. We also propose a new adaptive triplet loss to boost the metric learning of the embedding in a self-supervised manner. The final coarse-to-fine image retrieval pipeline is implemented as the sequential combination of models without and with Grad-SAM loss. Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMU-Seasons dataset. The strong generalization ability of our approach is verified on RobotCar dataset using models pre-trained on urban part of CMU-Seasons dataset. Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision, especially under the challenging environments with illumination variance, vegetation and night-time images. Moreover, real-site experiments have been conducted to validate the efficiency and effectiveness of the coarse-to-fine strategy for localization.

updated: Fri Nov 27 2020 11:16:10 GMT+0000 (UTC)

published: Wed Sep 16 2020 14:43:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト