Fully Transformer Networks for Semantic Image Segmentation

Sitong Wu; Tianyi Wu; Fangjian Lin; Shengwei Tian; Guodong Guo

セマンティック画像セグメンテーションのための完全なトランスフォーマーネットワーク

トランスフォーマーは、長距離の依存関係をモデル化できるため、さまざまな自然言語処理およびコンピュータービジョンタスクで優れたパフォーマンスを示しています。最近の進歩は、そのようなトランスフォーマーをCNNベースのセマンティック画像セグメンテーションモデルと組み合わせることが非常に有望であることを示しています。ただし、純粋なTransformerベースのアプローチが画像セグメンテーションに対してどれだけうまく達成できるかについてはまだ十分に研究されていません。この作業では、エンコーダーデコーダーベースのFully Transformer Networks（FTN）である、セマンティック画像セグメンテーションの新しいフレームワークを検討します。具体的には、まず、階層機能を段階的に学習するためのエンコーダーとしてPyramid Group Transformer（PGT）を提案すると同時に、標準のVisual Transformer（ViT）の計算の複雑さを軽減します。次に、セマンティック画像セグメンテーションのために、PGTエンコーダの複数のレベルからのセマンティックレベルと空間レベルの情報を融合する機能ピラミッドトランスフォーマー（FPT）を提案します。驚くべきことに、この単純なベースラインは、PASCAL Context、ADE20K、COCOStuff、CelebAMask-HQなど、複数の挑戦的なセマンティックセグメンテーションと顔解析ベンチマークでより良い結果を達成できます。ソースコードはhttps://github.com/BR-IDL/PaddleViTでリリースされます。

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated that combining such Transformers with CNN-based semantic image segmentation models is very promising. However, it is not well studied yet on how well a pure Transformer based approach can achieve for image segmentation. In this work, we explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN). Specifically, we first propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, meanwhile reducing the computation complexity of the standard Visual Transformer (ViT). Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation. Surprisingly, this simple baseline can achieve better results on multiple challenging semantic segmentation and face parsing benchmarks, including PASCAL Context, ADE20K, COCOStuff, and CelebAMask-HQ. The source code will be released on https://github.com/BR-IDL/PaddleViT.

updated: Tue Dec 28 2021 05:50:43 GMT+0000 (UTC)

published: Tue Jun 08 2021 05:15:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト