SATIN: A Multi-Task Metadataset for Classifying Satellite Imagery using Vision-Language Models

Jonathan Roberts; Kai Han; Samuel Albanie

SATIN: 視覚言語モデルを使用して衛星画像を分類するためのマルチタスクメタデータセット

リモートセンシング画像を解釈することで、土地利用計画から森林破壊の監視まで、さまざまな下流アプリケーションが可能になります。地球は地理的に多様であるため、このデータを確実に分類することは困難です。多くの異なる衛星および航空画像分類データセットが存在しますが、この多様性を適切にカバーするキュレートされたベンチマークはまだありません。この作業では、27 の既存のリモートセンシングデータセットからキュレートされたメタデータセットである SATellite ImageNet (SATIN) を紹介し、SATIN 上の幅広い視覚言語 (VL) モデルのゼロショット転送分類機能を包括的に評価します。 SATIN は挑戦的なベンチマークであることがわかりました。評価した最も強力な方法は、52.0% の分類精度を達成します。この重要なドメインでの VL モデルの進行状況をガイドおよび追跡するために、https://satinbenchmark.github.iopublic リーダーボードを提供しています。

Interpreting remote sensing imagery enables numerous downstream applications ranging from land-use planning to deforestation monitoring. Robustly classifying this data is challenging due to the Earth's geographic diversity. While many distinct satellite and aerial image classification datasets exist, there is yet to be a benchmark curated that suitably covers this diversity. In this work, we introduce SATellite ImageNet (SATIN), a metadataset curated from 27 existing remotely sensed datasets, and comprehensively evaluate the zero-shot transfer classification capabilities of a broad range of vision-language (VL) models on SATIN. We find SATIN to be a challenging benchmark-the strongest method we evaluate achieves a classification accuracy of 52.0%. We provide a https://satinbenchmark.github.iopublic leaderboard to guide and track the progress of VL models in this important domain.

updated: Sun Apr 23 2023 11:23:05 GMT+0000 (UTC)

published: Sun Apr 23 2023 11:23:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト