UNETR: Transformers for 3D Medical Image Segmentation

Ali Hatamizadeh; Dong Yang; Holger Roth; Daguang Xu

UNETR：3D医療画像セグメンテーション用のトランスフォーマー

収縮パスと拡張パス（エンコーダーやデコーダーなど）を備えた完全畳み込みニューラルネットワーク（FCNN）は、近年、さまざまな医療画像セグメンテーションアプリケーションで注目を集めています。これらのアーキテクチャでは、エンコーダは、デコーダによるセマンティック出力予測にさらに利用されるグローバルコンテキスト表現を学習することにより、不可欠な役割を果たします。それらの成功にもかかわらず、FCNNの主要な構成要素としての畳み込み層の局所性は、そのようなネットワークにおける長距離の空間依存性を学習する能力を制限します。長距離シーケンス学習における自然言語処理（NLP）のトランスフォーマーの最近の成功に触発されて、シーケンス間の予測問題としてボリューム（3D）医療画像セグメンテーションのタスクを再定式化します。特に、UNEt TRansformers（UNETR）と呼ばれる新しいアーキテクチャを紹介します。これは、純粋なトランスフォーマーをエンコーダーとして使用して、入力ボリュームのシーケンス表現を学習し、グローバルなマルチスケール情報を効果的にキャプチャします。トランスフォーマーエンコーダーは、さまざまな解像度のスキップ接続を介してデコーダーに直接接続され、最終的なセマンティックセグメンテーション出力を計算します。医療セグメンテーション十種競技（MSD）データセットを使用して、体積脳腫瘍および脾臓セグメンテーションタスクでさまざまなイメージングモダリティ（つまり、MRおよびCT）にわたって提案されたモデルのパフォーマンスを広範囲に検証し、その結果は一貫して好ましいベンチマークを示しています。

Fully Convolutional Neural Networks (FCNNs) with contracting and expansive paths (e.g. encoder and decoder) have shown prominence in various medical image segmentation applications during the recent years. In these architectures, the encoder plays an integral role by learning global contextual representations which will be further utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers , as the main building block of FCNNs limits the capability of learning long-range spatial dependencies in such networks. Inspired by the recent success of transformers in Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. In particular, we introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a pure transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have extensively validated the performance of our proposed model across different imaging modalities(i.e. MR and CT) on volumetric brain tumour and spleen segmentation tasks using the Medical Segmentation Decathlon (MSD) dataset, and our results consistently demonstrate favorable benchmarks.

updated: Thu Mar 18 2021 20:17:15 GMT+0000 (UTC)

published: Thu Mar 18 2021 20:17:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト