Reachability Embeddings: Scalable Self-Supervised Representation Learning from Markovian Trajectories for Geospatial Computer Vision

Swetava Ganguli; C. V. Krishnakumar Iyer; Vipul Pandey

到達可能性の埋め込み：地理空間コンピュータビジョンのためのマルコフ軌道からのスケーラブルな自己監視表現学習

自己教師あり表現学習手法は、セマンティックアノテーションのない大規模なデータセットを利用して、さまざまな下流の教師ありタスクを解決するために便利に転送できる意味のある普遍的な機能を学習します。この論文では、下流の地理空間コンピュータビジョンタスクを解決するために、ラベルのないGPS軌道から地理的位置の表現を学習するための自己監視方法を提案します。地球の表面のラスター表現から得られるタイルは、グラフ上のノードまたは画像のピクセルとしてモデル化されます。 GPS軌道は、これらのノードで許可されたマルコフ経路としてモデル化されます。スケーラブルで分散されたアルゴリズムは、観測されたマルコフ経路によって暗示されるタイルとその隣接物の間の空間接続パターンの到達可能性サマリーと呼ばれる画像のような表現を計算するために提示されます。畳み込み型の収縮型オートエンコーダーは、すべてのタイルの到達可能性サマリーの到達可能性埋め込みと呼ばれる圧縮表現を学習するようにトレーニングされています。到達可能性の埋め込みは、地理的な場所のタスクに依存しない特徴表現として機能します。教師ありセマンティックセグメンテーション問題としてキャストされた、5つの異なるダウンストリーム地理空間タスクのピクセル表現として到達可能性埋め込みを使用して、到達可能性埋め込みが意味的に意味のある表現であり、最大67％少ない軌道データを使用しながら、パフォーマンスが4〜23％向上することを定量的に示します。タイル間の空間接続を考慮しないピクセル表現を使用するベースラインモデルと比較した場合の、精度リコール曲線（AUPRC）メトリックの下の領域を使用して測定されます。到達可能性の埋め込みは、シーケンシャルな時空間モビリティデータを、他の画像ソースと組み合わせることができ、地理空間コンピュータビジョンでのマルチモーダル学習を容易にするように設計された意味的に意味のある画像のような表現に変換します。

Self-supervised representation learning techniques utilize large datasets without semantic annotations to learn meaningful, universal features that can be conveniently transferred to solve a wide variety of downstream supervised tasks. In this paper, we propose a self-supervised method for learning representations of geographic locations from unlabeled GPS trajectories to solve downstream geospatial computer vision tasks. Tiles resulting from a raster representation of the earth's surface are modeled as nodes on a graph or pixels of an image. GPS trajectories are modeled as allowed Markovian paths on these nodes. A scalable and distributed algorithm is presented to compute image-like representations, called reachability summaries, of the spatial connectivity patterns between tiles and their neighbors implied by the observed Markovian paths. A convolutional, contractive autoencoder is trained to learn compressed representations, called reachability embeddings, of reachability summaries for every tile. Reachability embeddings serve as task-agnostic, feature representations of geographic locations. Using reachability embeddings as pixel representations for five different downstream geospatial tasks, cast as supervised semantic segmentation problems, we quantitatively demonstrate that reachability embeddings are semantically meaningful representations and result in 4-23% gain in performance, while using upto 67% less trajectory data, as measured using area under the precision-recall curve (AUPRC) metric, when compared to baseline models that use pixel representations that do not account for the spatial connectivity between tiles. Reachability embeddings transform sequential, spatiotemporal mobility data into semantically meaningful image-like representations that can be combined with other sources of imagery and are designed to facilitate multimodal learning in geospatial computer vision.

updated: Sun Oct 24 2021 20:10:22 GMT+0000 (UTC)

published: Sun Oct 24 2021 20:10:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト