Transformer-Based Visual Segmentation: A Survey

Xiangtai Li; Henghui Ding; Wenwei Zhang; Haobo Yuan; Jiangmiao Pang; Guangliang Cheng; Kai Chen; Ziwei Liu; Chen Change Loy

Transformer ベースのビジュアルセグメンテーション: 調査

ビジュアルセグメンテーションは、画像、ビデオフレーム、または点群を複数のセグメントまたはグループに分割しようとします。この技術は、自動運転、画像編集、ロボットセンシング、医療分析など、数多くの実世界への応用があります。過去 10 年間で、深層学習ベースの手法はこの分野で目覚ましい進歩を遂げました。最近、もともと自然言語処理用に設計された自己注意に基づくニューラルネットワークの一種であるトランスフォーマーは、さまざまな視覚処理タスクにおける以前の畳み込みまたは再帰的アプローチを大幅に上回っています。具体的には、ビジョントランスフォーマーは、さまざまなセグメンテーションタスクに対して堅牢で統一された、さらにシンプルなソリューションを提供します。この調査では、Transformer ベースのビジュアルセグメンテーションの完全な概要を提供し、最近の進歩を要約しています。最初に、問題の定義、データセット、および以前の畳み込み手法を含む背景を確認します。次に、最近のすべてのトランスフォーマーベースのアプローチを統合するメタアーキテクチャをまとめます。このメタアーキテクチャに基づいて、メタアーキテクチャと関連するアプリケーションへの変更を含む、さまざまなメソッドの設計を検討します。また、3D 点群セグメンテーション、基盤モデルの調整、ドメイン認識セグメンテーション、効率的なセグメンテーション、医療セグメンテーションなど、密接に関連するいくつかの設定も提示します。さらに、いくつかの確立されたデータセットでレビューされたメソッドをコンパイルして再評価します。最後に、この分野における未解決の課題を特定し、将来の研究の方向性を提案します。プロジェクトページは https://github.com/lxtGH/Awesome-Segmenation-With-Transformer にあります。また、急速に進化するこの分野の動向を継続的に監視していきます。

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmenation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.

updated: Wed Apr 19 2023 17:59:02 GMT+0000 (UTC)

published: Wed Apr 19 2023 17:59:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト