nnFormer: Interleaved Transformer for Volumetric Segmentation

Hong-Yu Zhou; Jiansen Guo; Yinghao Zhang; Lequan Yu; Liansheng Wang; Yizhou Yu

nnFormer: ボリューム・セグメンテーションのためのインターリーブ・トランスフォーマー

自然言語処理のデフォルトモデルであるトランスフォーマーは、医療用画像処理の分野ではあまり注目されていない。長期的な依存性を利用できるトランスフォーマーは、非定型の畳み込みニューラルネットワーク(convnets)が本来持っている空間帰納的バイアスの欠点を克服するのに役立つと期待されている。しかし、最近提案されたトランスフォーマーを用いたセグメンテーション手法の多くは、畳み込み表現にグローバルコンテキストをエンコードするための補助モジュールとしてトランスフォーマーを扱っており、自己注意(すなわちトランスフォーマーの中核)と畳み込みを最適に組み合わせる方法を検討していなかった。この問題を解決するために、本稿では、自己注意と畳み込みの経験的な組み合わせに基づいたインターリーブ・アーキテクチャを持つ強力なセグメンテーション・モデルであるnnFormer(Not-aNother transFormer)を紹介する。実際には、nnFormerは3D局所体積から体積表現を学習する。このようなボリュームベースの処理は、ボクセルレベルの自己注意の実装と比較して、SynapseとACDCのデータセットにおいて、それぞれ約98%と99.5%の計算量の削減に貢献する。一般的に使用されている2つのデータセットであるSynapseとACDCにおいて、従来のネットワーク構成と比較して、nnFormerは従来のトランスフォーマーベースの手法を大幅に改善する。例えば、Synapseでは、nnFormerはSwin-UNetを7%以上、上回る。また、現在最高の性能を誇る完全畳み込み型の医療用セグメンテーションネットワークであるnnUNetと比較しても、SynapseとACDCにおいてnnFormerはわずかに優れた性能を発揮する。

Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., Not-aNother transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC.

updated: Tue Sep 07 2021 17:08:24 GMT+0000 (UTC)

published: Tue Sep 07 2021 17:08:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト