Fully Transformer Networks for Semantic ImageSegmentation

Sitong Wu; Tianyi Wu; Fangjian Lin; Shengwei Tian; Guodong Guo

セマンティックイメージセグメンテーションのための完全変換ネットワーク

トランスフォーマーは、長距離の依存関係をモデル化できるため、さまざまな自然言語処理およびコンピュータービジョンタスクで印象的なパフォーマンスを示しています。このようなトランスフォーマーを CNN ベースのセマンティックイメージセグメンテーションモデルと組み合わせるという最近の進歩は、非常に有望です。ただし、純粋なトランスフォーマーベースのアプローチが画像セグメンテーションをどの程度達成できるかについては、まだ十分に研究されていません。この作業では、エンコーダーデコーダーベースの完全変換ネットワーク (FTN) であるセマンティックイメージセグメンテーションの新しいフレームワークを探ります。具体的には、最初に、階層的特徴を漸進的に学習するためのエンコーダーとしてピラミッドグループトランスフォーマー (PGT) を提案し、標準のビジュアルトランスフォーマー (ViT) の計算の複雑さを軽減します。次に、セマンティックイメージセグメンテーション用の PGT エンコーダーの複数のレベルからのセマンティックレベルと空間レベルの情報を融合する Feature Pyramid Transformer (FPT) を提案します。驚くべきことに、この単純なベースラインは、PASCAL Context、ADE20K、COCO-Stuff など、複数の困難なセマンティックセグメンテーションベンチマークで最新の結果を達成できます。この作品の公開に合わせてソースコードを公開します。

Transformers have shown impressive performance in various natural language processing and computer vision tasks, due to the capability of modeling long-range dependencies. Recent progress has demonstrated to combine such transformers with CNN-based semantic image segmentation models is very promising. However, it is not well studied yet on how well a pure transformer based approach can achieve for image segmentation. In this work, we explore a novel framework for semantic image segmentation, which is encoder-decoder based Fully Transformer Networks (FTN). Specifically, we first propose a Pyramid Group Transformer (PGT) as the encoder for progressively learning hierarchical features, while reducing the computation complexity of the standard visual transformer(ViT). Then, we propose a Feature Pyramid Transformer (FPT) to fuse semantic-level and spatial-level information from multiple levels of the PGT encoder for semantic image segmentation. Surprisingly, this simple baseline can achieve new state-of-the-art results on multiple challenging semantic segmentation benchmarks, including PASCAL Context, ADE20K and COCO-Stuff. The source code will be released upon the publication of this work.

updated: Tue Jun 08 2021 05:15:28 GMT+0000 (UTC)

published: Tue Jun 08 2021 05:15:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト