SpectralFormer: Rethinking Hyperspectral Image Classification with Transformers

Danfeng Hong; Zhu Han; Jing Yao; Lianru Gao; Bing Zhang; Antonio Plaza; Jocelyn Chanussot

SpectralFormer：トランスフォーマーを使用したハイパースペクトル画像分類の再考

ハイパースペクトル（HS）画像は、ほぼ連続したスペクトル情報によって特徴付けられ、微妙なスペクトルの不一致をキャプチャすることにより、材料の詳細な識別を可能にします。優れたローカルコンテキストモデリング機能により、畳み込みニューラルネットワーク（CNN）は、HS画像分類における強力な特徴抽出器であることが証明されています。ただし、CNNは、固有のネットワークバックボーンの制限により、スペクトルシグネチャのシーケンス属性をマイニングおよび表現できません。この問題を解決するために、トランスフォーマーを使用したシーケンシャルな観点からHS画像分類を再考し、SpectralFormerと呼ばれる新しいバックボーンネットワークを提案します。 SpectralFormerは、従来のトランスフォーマーでのバンドごとの表現に加えて、HS画像の隣接するバンドからスペクトル的にローカルなシーケンス情報を学習し、グループごとのスペクトル埋め込みを生成できます。さらに重要なことに、レイヤーごとの伝播プロセスで貴重な情報が失われる可能性を減らすために、レイヤー間で「ソフト」な残差を融合することを適応的に学習することにより、浅いレイヤーから深いレイヤーにメモリのようなコンポーネントを伝達するクロスレイヤースキップ接続を考案します。提案されたSpectralFormerは非常に柔軟なバックボーンネットワークであり、ピクセル単位とパッチ単位の両方の入力に適用できることは注目に値します。広範な実験を実施し、従来のトランスフォーマーに対する優位性を示し、最先端のバックボーンネットワークと比較して大幅な改善を達成することにより、3つのHSデータセットで提案されたSpectralFormerの分類パフォーマンスを評価します。この作業のコードは、再現性のためにhttps://github.com/danfenghong/IEEE_TGRS_SpectralFormerで入手できます。

Hyperspectral (HS) images are characterized by approximately contiguous spectral information, enabling the fine identification of materials by capturing subtle spectral discrepancies. Owing to their excellent locally contextual modeling ability, convolutional neural networks (CNNs) have been proven to be a powerful feature extractor in HS image classification. However, CNNs fail to mine and represent the sequence attributes of spectral signatures well due to the limitations of their inherent network backbone. To solve this issue, we rethink HS image classification from a sequential perspective with transformers, and propose a novel backbone network called SpectralFormer. Beyond band-wise representations in classic transformers, SpectralFormer is capable of learning spectrally local sequence information from neighboring bands of HS images, yielding group-wise spectral embeddings. More significantly, to reduce the possibility of losing valuable information in the layer-wise propagation process, we devise a cross-layer skip connection to convey memory-like components from shallow to deep layers by adaptively learning to fuse "soft" residuals across layers. It is worth noting that the proposed SpectralFormer is a highly flexible backbone network, which can be applicable to both pixel- and patch-wise inputs. We evaluate the classification performance of the proposed SpectralFormer on three HS datasets by conducting extensive experiments, showing the superiority over classic transformers and achieving a significant improvement in comparison with state-of-the-art backbone networks. The codes of this work will be available at https://github.com/danfenghong/IEEE_TGRS_SpectralFormer for the sake of reproducibility.

updated: Sat Nov 20 2021 01:26:16 GMT+0000 (UTC)

published: Wed Jul 07 2021 02:59:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト