3D UX-Net: A Large Kernel Volumetric ConvNet Modernizing Hierarchical Transformer for Medical Image Segmentation

Ho Hin Lee; Shunxing Bao; Yuankai Huo; Bennett A. Landman

3D UX-Net: 医用画像セグメンテーション用の階層トランスフォーマーを近代化する大規模なカーネルボリューメトリック ConvNet

最近の 3D 医療用 ViT (SwinUNETR など) は、3D 医療画像セグメンテーションを含む、いくつかの 3D ボリュメトリックデータベンチマークで最先端のパフォーマンスを実現しています。階層型トランスフォーマー (Swin Transformers など) は、いくつかの ConvNet プリアを再導入し、3D 医療データセットでボリューメトリックセグメンテーションを適応させる実用的な実行可能性をさらに強化しました。ハイブリッドアプローチの有効性は、非局所的自己注意の大きな受容野と多数のモデルパラメーターに大きく依存しています。この作業では、3D UX-Net と呼ばれる軽量のボリューメトリック ConvNet を提案します。これは、堅牢なボリューメトリックセグメンテーションのために ConvNet モジュールを使用して階層トランスフォーマーを適応させます。具体的には、Swin Transformer に触発された、より大きなグローバル受容野を有効にするために、大きなカーネルサイズ (たとえば、7×7×7 から開始) を使用したボリューメトリックな深さ方向の畳み込みを再検討します。さらに、Swin Transformer ブロックの多層パーセプトロン (MLP) を点ごとの深度畳み込みに置き換え、正規化層と活性化層を減らしてモデルのパフォーマンスを向上させ、モデルパラメーターの数を減らします。 3D UX-Net は、1) MICCAI チャレンジ 2021 FLARE、2) MICCAI チャレンジ 2021 FeTA、3) MICCAI チャレンジ 2022 AMOS の 3 つのチャレンジングな公開データセットを使用して、現在の SOTA トランスフォーマー (SwinUNETR など) と有利に競合します。 3D UX-Net は、0.929 から 0.938 Dice (FLARE2021) および 0.867 から 0.874 Dice (Feta2021) に改善され、SwinUNETR より一貫して優れています。さらに、AMOS2022 を使用した 3D UX-Net の転移学習機能を評価し、2.27% のダイス (0.880 から 0.900 へ) の別の改善を示します。提案されたモデルのソースコードは、https://github.com/MASILab/3DUX-Net で入手できます。

The recent 3D medical ViTs (e.g., SwinUNETR) achieve the state-of-the-art performances on several 3D volumetric data benchmarks, including 3D medical image segmentation. Hierarchical transformers (e.g., Swin Transformers) reintroduced several ConvNet priors and further enhanced the practical viability of adapting volumetric segmentation in 3D medical datasets. The effectiveness of hybrid approaches is largely credited to the large receptive field for non-local self-attention and the large number of model parameters. In this work, we propose a lightweight volumetric ConvNet, termed 3D UX-Net, which adapts the hierarchical transformer using ConvNet modules for robust volumetric segmentation. Specifically, we revisit volumetric depth-wise convolutions with large kernel size (e.g. starting from 7×7×7) to enable the larger global receptive fields, inspired by Swin Transformer. We further substitute the multi-layer perceptron (MLP) in Swin Transformer blocks with pointwise depth convolutions and enhance model performances with fewer normalization and activation layers, thus reducing the number of model parameters. 3D UX-Net competes favorably with current SOTA transformers (e.g. SwinUNETR) using three challenging public datasets on volumetric brain and abdominal imaging: 1) MICCAI Challenge 2021 FLARE, 2) MICCAI Challenge 2021 FeTA, and 3) MICCAI Challenge 2022 AMOS. 3D UX-Net consistently outperforms SwinUNETR with improvement from 0.929 to 0.938 Dice (FLARE2021) and 0.867 to 0.874 Dice (Feta2021). We further evaluate the transfer learning capability of 3D UX-Net with AMOS2022 and demonstrates another improvement of 2.27% Dice (from 0.880 to 0.900). The source code with our proposed model are available at https://github.com/MASILab/3DUX-Net.

updated: Thu Mar 02 2023 03:58:57 GMT+0000 (UTC)

published: Thu Sep 29 2022 19:54:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト