SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Enze Xie; Wenhai Wang; Zhiding Yu; Anima Anandkumar; Jose M. Alvarez; Ping Luo

SegFormer: トランスフォーマーを使用したセマンティックセグメンテーションのシンプルで効率的な設計

SegFormer は、シンプルで効率的かつ強力なセマンティックセグメンテーションフレームワークであり、トランスフォーマーと軽量の多層認識 (MLP) デコーダーを統合します。 SegFormer には 2 つの魅力的な機能があります。1) SegFormer には、マルチスケール機能を出力する新しい階層構造の Transformer エンコーダーが含まれています。位置エンコードを必要としないため、テスト解像度がトレーニングと異なる場合にパフォーマンスの低下につながる位置コードの補間が回避されます。 2) SegFormer は複雑なデコーダーを回避します。提案された MLP デコーダーは、さまざまなレイヤーからの情報を集約し、ローカルアテンションとグローバルアテンションの両方を組み合わせて強力な表現をレンダリングします。このシンプルで軽量なデザインが、トランスフォーマーでの効率的なセグメンテーションの鍵であることを示しています。 SegFormer-B0 から SegFormer-B5 までの一連のモデルを取得するためにアプローチをスケールアップし、以前の同等品よりも大幅に優れたパフォーマンスと効率を実現します。たとえば、SegFormer-B4 は ADE20K で 64M パラメータを使用して 50.3% の mIoU を達成し、以前の最良の方法よりも 5 倍小さく、2.2% 優れています。私たちの最高のモデルである SegFormer-B5 は、Cityscapes 検証セットで 84.0% mIoU を達成し、Cityscapes-C で優れたゼロショット堅牢性を示しています。コードは github.com/NVlabs/SegFormer でリリースされます。

We present SegFormer, a simple, efficient yet powerful semantic segmentation framework which unifies Transformers with lightweight multilayer perception (MLP) decoders. SegFormer has two appealing features: 1) SegFormer comprises a novel hierarchically structured Transformer encoder which outputs multiscale features. It does not need positional encoding, thereby avoiding the interpolation of positional codes which leads to decreased performance when the testing resolution differs from training. 2) SegFormer avoids complex decoders. The proposed MLP decoder aggregates information from different layers, and thus combining both local attention and global attention to render powerful representations. We show that this simple and lightweight design is the key to efficient segmentation on Transformers. We scale our approach up to obtain a series of models from SegFormer-B0 to SegFormer-B5, reaching significantly better performance and efficiency than previous counterparts. For example, SegFormer-B4 achieves 50.3% mIoU on ADE20K with 64M parameters, being 5x smaller and 2.2% better than the previous best method. Our best model, SegFormer-B5, achieves 84.0% mIoU on Cityscapes validation set and shows excellent zero-shot robustness on Cityscapes-C. Code will be released at: github.com/NVlabs/SegFormer.

updated: Sat Jun 05 2021 22:51:54 GMT+0000 (UTC)

published: Mon May 31 2021 17:59:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト