nnFormer: Interleaved Transformer for Volumetric Segmentation

Hong-Yu Zhou; Jiansen Guo; Yinghao Zhang; Lequan Yu; Liansheng Wang; Yizhou Yu

nnFormer：ボリュームセグメンテーション用のインターリーブトランスフォーマー

自然言語処理のデフォルトモデルであるトランスフォーマーは、医用画像コミュニティからほとんど注目されていません。長期的な依存関係を活用する能力を考えると、トランスフォーマーは、非定型の畳み込みニューラルネットワーク（convnets）が空間誘導バイアスの固有の欠点を克服するのに役立つと期待されています。ただし、最近提案されたトランスフォーマーベースのセグメンテーションアプローチのほとんどは、トランスフォーマーを支援モジュールとして扱い、自己注意（つまりトランスフォーマーのコア）と畳み込みを最適に組み合わせる方法を調査することなく、グローバルコンテキストを畳み込み表現にエンコードするのに役立ちます。この問題に対処するために、この論文では、自己注意と畳み込みの経験的組み合わせに基づくインターリーブアーキテクチャを備えた強力なセグメンテーションモデルであるnnFormer（つまり、Not-aNotherトランスフォーマー）を紹介します。実際には、nnFormerは3Dローカルボリュームからボリューム表現を学習します。ナイーブなボクセルレベルの自己注意の実装と比較して、このようなボリュームベースの操作は、SynapseデータセットとACDCデータセットでそれぞれ約98％と99.5％の計算の複雑さを軽減するのに役立ちます。従来技術のネットワーク構成と比較して、nnFormerは、2つの一般的に使用されるデータセットSynapseおよびACDCで、以前のトランスフォーマーベースの方法に比べて大幅な改善を実現しています。たとえば、nnFormerはSynapseでSwin-UNetを7％以上上回っています。現在最高のパフォーマンスを発揮する完全畳み込み医療セグメンテーションネットワークであるnnUNetと比較した場合でも、nnFormerはSynapseとACDCでわずかに優れたパフォーマンスを提供します。

Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., Not-aNother transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC.

updated: Thu Sep 09 2021 05:09:27 GMT+0000 (UTC)

published: Tue Sep 07 2021 17:08:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト