SeMask: Semantically Masked Transformers for Semantic Segmentation

Jitesh Jain; Anukriti Singh; Nikita Orlov; Zilong Huang; Jiachen Li; Steven Walton; Humphrey Shi

SeMask：セマンティックセグメンテーションのためのセマンティックマスクされたトランスフォーマー

画像トランスフォーマーネットワークのエンコーダー部分で事前トレーニングされたバックボーンを微調整することは、セマンティックセグメンテーションタスクの従来のアプローチでした。ただし、このようなアプローチでは、エンコード段階で画像が提供するセマンティックコンテキストが除外されます。このホワイトペーパーでは、画像のセマンティック情報を事前にトレーニングされた階層型トランスフォーマーベースのバックボーンに組み込み、微調整することでパフォーマンスが大幅に向上すると主張しています。これを実現するために、セマンティックアテンション操作を使用してセマンティック情報をエンコーダに組み込むシンプルで効果的なフレームワークであるSeMaskを提案します。さらに、トレーニング中に軽量セマンティックデコーダーを使用して、すべての段階で中間セマンティック事前マップを監視します。私たちの実験は、セマンティックプライアを組み込むと、FLOPの数がわずかに増えるだけで、確立された階層型エンコーダのパフォーマンスが向上することを示しています。エンコーダーをさまざまなデコーダーと組み合わせて、SeMaskをSwinTransformerおよびMixTransformerバックボーンに統合することで経験的な証拠を提供します。私たちのフレームワークは、ADE20Kデータセットで58.22％mIoUの新しい最先端を実現し、CityscapesデータセットでmIoUメトリックを3％以上改善します。コードとチェックポイントは、https：//github.com/Picsart-AI-Research/SeMask-Segmentationで公開されています。

Finetuning a pretrained backbone in the encoder part of an image transformer network has been the traditional approach for the semantic segmentation task. However, such an approach leaves out the semantic context that an image provides during the encoding stage. This paper argues that incorporating semantic information of the image into pretrained hierarchical transformer-based backbones while finetuning improves the performance considerably. To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation. In addition, we use a lightweight semantic decoder during training to provide supervision to the intermediate semantic prior maps at every stage. Our experiments demonstrate that incorporating semantic priors enhances the performance of the established hierarchical encoders with a slight increase in the number of FLOPs. We provide empirical proof by integrating SeMask into Swin Transformer and Mix Transformer backbones as our encoder paired with different decoders. Our framework achieves a new state-of-the-art of 58.22% mIoU on the ADE20K dataset and improvements of over 3% in the mIoU metric on the Cityscapes dataset. The code and checkpoints are publicly available at https://github.com/Picsart-AI-Research/SeMask-Segmentation .

updated: Thu Mar 17 2022 13:58:53 GMT+0000 (UTC)

published: Thu Dec 23 2021 18:56:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト