HFGD: High-level Feature Guided Decoder for Semantic Segmentation

Ye Huang; Di Kang; Shenghua Gao; Wen Li; Lixin Duan

HFGD: セマンティックセグメンテーション用の高レベルの機能ガイド付きデコーダー

既存のピラミッドベースのアップサンプラー (SemanticFPN など) は効率的ではありますが、通常、同じバックボーンを使用する場合、拡張ベースのモデルと比較して精度の低い結果を生成します。これは、高レベルの機能が限られたデータ上でノイズの多い低レベルの機能と融合および微調整されるため、汚染された高レベルの機能によって部分的に引き起こされます。この問題に対処するために、きめの細かい低レベルの特徴をアップサンプリングすることを学習する際に、強力な事前トレーニングされた高レベルの特徴をガイダンスとして使用すること (HFG) を提案します。具体的には、クラストークンは、バックボーンの高レベルの機能のみとともにトレーニングされます。これらのクラストークンは、アップサンプラーによって分類のために再利用され、アップサンプラーの機能をより識別的なバックボーン機能に導きます。 HFG の重要な設計の 1 つは、バックボーンがアップサンプラーからの勾配に従って更新されないように、適切な停止勾配操作による汚染から高レベルの機能を保護することです。 HFG の上限を押し上げるために、低解像度の高レベルの特徴を効率的かつ効果的に操作できるコンテキスト拡張エンコーダー (CAE) を導入します。その結果、表現が改善され、ガイダンスが向上します。提案された手法を、Pascal Context、COCOStuff164k、Cityscapes の 3 つのベンチマークで評価します。私たちの方法は、追加のトレーニングデータを使用しない方法の中で最先端の結果を達成し、その有効性と一般化能力を実証しています。完全なコードが公開されます

Existing pyramid-based upsamplers (e.g. SemanticFPN), although efficient, usually produce less accurate results compared to dilation-based models when using the same backbone. This is partially caused by the contaminated high-level features since they are fused and fine-tuned with noisy low-level features on limited data. To address this issue, we propose to use powerful pretrained high-level features as guidance (HFG) when learning to upsample the fine-grained low-level features. Specifically, the class tokens are trained along with only the high-level features from the backbone. These class tokens are reused by the upsampler for classification, guiding the upsampler features to more discriminative backbone features. One key design of the HFG is to protect the high-level features from being contaminated with proper stop-gradient operations so that the backbone does not update according to the gradient from the upsampler. To push the upper limit of HFG, we introduce an context augmentation encoder (CAE) that can efficiently and effectively operates on low-resolution high-level feature, resulting in improved representation and thus better guidance. We evaluate the proposed method on three benchmarks: Pascal Context, COCOStuff164k, and Cityscapes. Our method achieves state-of-the-art results among methods that do not use extra training data, demonstrating its effectiveness and generalization ability. The complete code will be released

updated: Wed Aug 16 2023 12:15:29 GMT+0000 (UTC)

published: Wed Mar 15 2023 14:23:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト