Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

Di Wang; Qiming Zhang; Yufei Xu; Jing Zhang; Bo Du; Dacheng Tao; Liangpei Zhang

プレーンビジョントランスフォーマーをリモートセンシング基盤モデルに向けて前進させる

大規模なビジョンファウンデーションモデルは、自然画像の視覚タスクにおいて大きな進歩を遂げました。優れたスケーラビリティと表現能力により、ビジョントランスフォーマーが主な選択肢となっています。ただし、リモートセンシング (RS) の大規模モデルはまだ十分に検討されていません。この論文では、約 1 億個のパラメーターを持つ単純なビジョントランスフォーマーに頼り、RS タスクに合わせた大規模なビジョンモデルを提案し、そのような大規模なモデルがどのように機能するかを調査する最初の試みを行います。 RS 画像の大きなサイズと任意の向きのオブジェクトを処理するために、トランスフォーマーの元の完全な注意を置き換えるために、新しい回転可変サイズウィンドウの注意を提案します。生成された多様なウィンドウからの豊富なコンテキスト。検出タスクの実験では、DOTA-V1.0 データセットで 81.24% の mAP を達成し、すべての最先端のモデルに対するモデルの優位性が示されています。ダウンストリームの分類およびセグメンテーションタスクに関するモデルの結果も、既存の高度な方法と比較して競争力のあるパフォーマンスを示しています。さらなる実験により、計算の複雑さと転送時のデータ効率の点で、モデルの利点が示されます。

Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers being the primary choice due to their good scalability and representation ability. However, large-scale models in remote sensing (RS) have not yet been sufficiently explored. In this paper, we resort to plain vision transformers with about 100 million parameters and make the first attempt to propose large vision models tailored to RS tasks and investigate how such large models perform. To handle the large sizes and objects of arbitrary orientations in RS images, we propose a new rotated varied-size window attention to replace the original full attention in transformers, which can significantly reduce the computational cost and memory footprint while learning better object representation by extracting rich context from the generated diverse windows. Experiments on detection tasks show the superiority of our model over all state-of-the-art models, achieving 81.24% mAP on the DOTA-V1.0 dataset. The results of our models on downstream classification and segmentation tasks also show competitive performance compared to existing advanced methods. Further experiments show the advantages of our models in terms of computational complexity and data efficiency in transferring.

updated: Sun Nov 13 2022 03:09:49 GMT+0000 (UTC)

published: Mon Aug 08 2022 09:08:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト