UNETR: Transformers for 3D Medical Image Segmentation

Ali Hatamizadeh; Yucheng Tang; Vishwesh Nath; Dong Yang; Andriy Myronenko; Bennett Landman; Holger Roth; Daguang Xu

UNETR：3D医療画像セグメンテーション用のトランスフォーマー

パスの縮小と拡大を伴う完全畳み込みニューラルネットワーク（FCNN）は、過去10年以来、医療画像セグメンテーションアプリケーションの大部分で卓越していることを示しています。 FCNNでは、エンコーダーは、デコーダーによるセマンティック出力予測に利用できるグローバル機能とローカル機能の両方、およびコンテキスト表現を学習することにより、不可欠な役割を果たします。それらの成功にもかかわらず、FCNNの畳み込み層の局所性は、長距離の空間依存性を学習する能力を制限します。長距離シーケンス学習における自然言語処理（NLP）のトランスフォーマーの最近の成功に触発されて、シーケンス間の予測問題としてボリューム（3D）医療画像セグメンテーションのタスクを再定式化します。 UNEt TRansformers（UNETR）と呼ばれる新しいアーキテクチャを紹介します。このアーキテクチャは、エンコーダとしてトランスフォーマーを利用して、入力ボリュームのシーケンス表現を学習し、グローバルなマルチスケール情報を効果的にキャプチャすると同時に、成功した「U字型」ネットワークを追跡します。エンコーダーとデコーダーの設計。トランスフォーマーエンコーダーは、さまざまな解像度のスキップ接続を介してデコーダーに直接接続され、最終的なセマンティックセグメンテーション出力を計算します。多臓器セグメンテーション用の頭蓋ボールトを超えたマルチアトラスラベリング（BTCV）データセット、および脳腫瘍と脾臓セグメンテーションタスク用の医療セグメンテーション十種競技（MSD）データセットでメソッドのパフォーマンスを検証しました。私たちのベンチマークは、BTCVリーダーボードでの新しい最先端のパフォーマンスを示しています。コード：https：//monai.io/research/unetr

Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard. Code: https://monai.io/research/unetr

updated: Sat Oct 09 2021 17:25:43 GMT+0000 (UTC)

published: Thu Mar 18 2021 20:17:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト