TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation

Jieneng Chen; Yongyi Lu; Qihang Yu; Xiangde Luo; Ehsan Adeli; Yan Wang; Le Lu; Alan L. Yuille; Yuyin Zhou

TransUNet: トランスフォーマーが医療画像のセグメンテーションのための強力なエンコーダーを作る

医用画像のセグメンテーションは、病気の診断や治療計画など、ヘルスケアシステムを開発する上で必須の前提である。様々な医用画像のセグメンテーションタスクにおいて、U-Netとして知られるU字型のアーキテクチャは事実上の標準となっており、大きな成功を収めている。しかし、U-Netは、畳み込み演算の本質的な局所性のため、長距離依存性を明示的にモデル化するには限界がある。一方，配列間予測のために設計されたトランスフォーマーは、生来のグローバルな自己注意メカニズムを備えた代替アーキテクチャとして登場したが、低レベルの詳細が不十分であるため、局所化能力が制限されることになる。本論文では、医療画像のセグメンテーションのための強力な代替手段として、トランスフォーマーとU-Netの両方をメリットとするTransUNetを提案する。トランスフォーマーは、CNN(Convolution Neural Network)特徴マップからトークン化された画像パッチを、グローバルコンテキストを抽出するための入力シーケンスとしてエンコードする。一方、デコーダは、エンコードされた特徴をアップサンプリングし、高解像度のCNN特徴マップと組み合わせることで、正確なローカライズを可能にする。我々は、トランスフォーマーが医療画像のセグメンテーションタスクにおいて強力なエンコーダとして機能し、U-Netと組み合わせることで、ローカライズされた空間情報を回復することで、より細かいディテールを強調することができると主張する。TransUNetは、多臓器のセグメンテーションや心臓のセグメンテーションなど、さまざまな医療アプリケーションにおいて、競合するさまざまな手法よりも優れた性能を達成している。コードとモデルは https://github.com/Beckschen/TransUNet で公開されている。

Medical image segmentation is an essential prerequisite for developing healthcare systems, especially for disease diagnosis and treatment planning. On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard and achieved tremendous success. However, due to the intrinsic locality of convolution operations, U-Net generally demonstrates limitations in explicitly modeling long-range dependency. Transformers, designed for sequence-to-sequence prediction, have emerged as alternative architectures with innate global self-attention mechanisms, but can result in limited localization abilities due to insufficient low-level details. In this paper, we propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation. On one hand, the Transformer encodes tokenized image patches from a convolution neural network (CNN) feature map as the input sequence for extracting global contexts. On the other hand, the decoder upsamples the encoded features which are then combined with the high-resolution CNN feature maps to enable precise localization. We argue that Transformers can serve as strong encoders for medical image segmentation tasks, with the combination of U-Net to enhance finer details by recovering localized spatial information. TransUNet achieves superior performances to various competing methods on different medical applications including multi-organ segmentation and cardiac segmentation. Code and models are available at https://github.com/Beckschen/TransUNet.

updated: Mon Feb 08 2021 16:10:50 GMT+0000 (UTC)

published: Mon Feb 08 2021 16:10:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト