Transformer-Based Visual Segmentation: A Survey

Xiangtai Li; Henghui Ding; Haobo Yuan; Wenwei Zhang; Jiangmiao Pang; Guangliang Cheng; Kai Chen; Ziwei Liu; Chen Change Loy

トランスフォーマーベースのビジュアルセグメンテーション: 調査

視覚的セグメンテーションでは、画像、ビデオフレーム、または点群を複数のセグメントまたはグループに分割しようとします。この技術は、自動運転、画像編集、ロボットセンシング、医療分析など、現実世界に数多く応用されています。過去 10 年間で、ディープラーニングベースの手法がこの分野で目覚ましい進歩を遂げました。最近、もともと自然言語処理用に設計された自己注意に基づくニューラルネットワークの一種であるトランスフォーマーは、さまざまな視覚処理タスクにおいて、以前の畳み込みまたは再帰的アプローチを大幅に上回りました。具体的には、ビジョントランスフォーマーは、さまざまなセグメンテーションタスクに対して、堅牢で統合された、さらにシンプルなソリューションを提供します。この調査では、トランスフォーマーベースのビジュアルセグメンテーションの徹底的な概要を提供し、最近の進歩を要約します。まず、問題定義、データセット、従来の畳み込み手法を含む背景を確認します。次に、最近のすべてのトランスフォーマーベースのアプローチを統合するメタアーキテクチャを要約します。このメタアーキテクチャに基づいて、メタアーキテクチャと関連するアプリケーションの修正を含むさまざまな方法設計を検討します。また、3D 点群セグメンテーション、基礎モデルの調整、ドメイン認識セグメンテーション、効率的なセグメンテーション、医療セグメンテーションなど、密接に関連するいくつかの設定も紹介します。さらに、いくつかの十分に確立されたデータセットでレビューされた手法を編集し、再評価します。最後に、この分野における未解決の課題を特定し、将来の研究の方向性を提案します。プロジェクトページは https://github.com/lxtGH/Awesome-Segmentation-With-Transformer にあります。私たちはまた、この急速に進化する分野の発展を継続的に監視していきます。

Visual segmentation seeks to partition images, video frames, or point clouds into multiple segments or groups. This technique has numerous real-world applications, such as autonomous driving, image editing, robot sensing, and medical analysis. Over the past decade, deep learning-based methods have made remarkable strides in this area. Recently, transformers, a type of neural network based on self-attention originally designed for natural language processing, have considerably surpassed previous convolutional or recurrent approaches in various vision processing tasks. Specifically, vision transformers offer robust, unified, and even simpler solutions for various segmentation tasks. This survey provides a thorough overview of transformer-based visual segmentation, summarizing recent advancements. We first review the background, encompassing problem definitions, datasets, and prior convolutional methods. Next, we summarize a meta-architecture that unifies all recent transformer-based approaches. Based on this meta-architecture, we examine various method designs, including modifications to the meta-architecture and associated applications. We also present several closely related settings, including 3D point cloud segmentation, foundation model tuning, domain-aware segmentation, efficient segmentation, and medical segmentation. Additionally, we compile and re-evaluate the reviewed methods on several well-established datasets. Finally, we identify open challenges in this field and propose directions for future research. The project page can be found at https://github.com/lxtGH/Awesome-Segmentation-With-Transformer. We will also continually monitor developments in this rapidly evolving field.

updated: Wed Dec 20 2023 05:21:20 GMT+0000 (UTC)

published: Wed Apr 19 2023 17:59:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト