PIGEON: Predicting Image Geolocations

Lukas Haas; Michal Skreta; Silas Alberti

PIGEON: 画像の地理的位置の予測

PIGEON は、外部ベンチマークと人間による評価の両方で最先端のパフォーマンスを実現する、地球規模の画像地理位置特定のためのマルチタスクのエンドツーエンドシステムです。私たちの作業には、ラベルスムージングを使用したセマンティックジオセルの作成が組み込まれており、地理情報を含む画像に対してビジョントランスフォーマーの事前トレーニングを実施し、ジオセルの候補セット全体にわたって ProtoNets を使用して位置予測を改良します。 PIGEON の貢献は 3 つあります。まず、あらゆる地理空間データセットに適応できるオープンソースデータに基づいて、セマンティックジオセルの作成および分割アルゴリズムを設計します。次に、ジオセル内リファインメントの有効性と、教師なしクラスタリングと ProtNets のタスクへの適用可能性を示します。最後に、事前トレーニング済みの CLIP 変換モデル StreetCLIP を公開し、気候変動との闘いや都市と農村の状況の理解に応用できる隣接ドメインで使用できるようにします。

We introduce PIGEON, a multi-task end-to-end system for planet-scale image geolocalization that achieves state-of-the-art performance on both external benchmarks and in human evaluation. Our work incorporates semantic geocell creation with label smoothing, conducts pretraining of a vision transformer on images with geographic information, and refines location predictions with ProtoNets across a candidate set of geocells. The contributions of PIGEON are three-fold: first, we design a semantic geocells creation and splitting algorithm based on open-source data which can be adapted to any geospatial dataset. Second, we show the effectiveness of intra-geocell refinement and the applicability of unsupervised clustering and ProtNets to the task. Finally, we make our pre-trained CLIP transformer model, StreetCLIP, publicly available for use in adjacent domains with applications to fighting climate change and urban and rural scene understanding.

updated: Thu Jul 13 2023 13:22:36 GMT+0000 (UTC)

published: Tue Jul 11 2023 23:36:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト