Where in the World is this Image? Transformer-based Geo-localization in the Wild

Shraman Pramanick; Ewa M. Nowara; Joshua Gleason; Carlos D. Castillo; Rama Chellappa

この画像は世界のどこにありますか？野生の変圧器ベースの地理的位置特定

世界中のどこかで撮影された単一の地上レベルのRGB画像から地理的位置（地理的位置特定）を予測することは、非常に困難な問題です。課題には、さまざまな環境シナリオによる画像の多様性、時間帯、天気、季節による同じ場所の外観の大幅な変化が含まれます。さらに重要なことに、予測は、おそらくいくつかの地理的位置の手がかり。これらの理由により、既存の作品のほとんどは、特定の都市、画像、または世界的なランドマークに限定されています。この作業では、惑星規模の単一画像の地理的位置特定に対する効率的なソリューションの開発に焦点を当てます。この目的のために、画像全体の細部に注意を払い、極端な外観の変化の下で堅牢な特徴表現を生成する、統合されたデュアルブランチトランスフォーマーネットワークであるTransLocatorを提案します。 TransLocatorは、RGB画像とそのセマンティックセグメンテーションマップを入力として受け取り、各トランスフォーマーレイヤーの後に2つの並列ブランチ間で相互作用し、同時にマルチタスク方式で地理的位置特定とシーン認識を実行します。 TransLocatorを4つのベンチマークデータセット（Im2GPS、Im2GPS3k、YFCC4k、YFCC26k）で評価し、最先端の大陸レベルの精度を5.5％、14.1％、4.9％、9.9％向上させます。 TransLocatorは、実際のテスト画像でも検証されており、以前の方法よりも効果的であることがわかりました。

Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a single image possibly having only a few geo-locating cues. For these reasons, most existing works are restricted to specific cities, imagery, or worldwide landmarks. In this work, we focus on developing an efficient solution to planet-scale single-image geo-localization. To this end, we propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image and produces robust feature representation under extreme appearance variations. TransLocator takes an RGB image and its semantic segmentation map as inputs, interacts between its two parallel branches after each transformer layer, and simultaneously performs geo-localization and scene recognition in a multi-task fashion. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement over the state-of-the-art. TransLocator is also validated on real-world test images and found to be more effective than previous methods.

updated: Mon Jul 25 2022 05:39:57 GMT+0000 (UTC)

published: Fri Apr 29 2022 03:27:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト