LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation

Guoping Xu; Xingrong Wu; Xuan Zhang; Xinwei He

LeViT-UNet：医療画像セグメンテーション用のトランスフォーマーでより高速なエンコーダーを作成

医療画像のセグメンテーションは、コンピュータ支援診断および治療システムの開発において重要な役割を果たしますが、それでも多くの課題に直面しています。過去数年間で、CNNに基づく一般的なエンコーダ-デコーダアーキテクチャ（U-Netなど）が医療画像セグメンテーションのタスクに正常に適用されました。ただし、畳み込み演算の局所性により、グローバルコンテキストと長距離空間関係の学習には限界があります。最近、数人の研究者がエンコーダとデコーダの両方のコンポーネントに変圧器を導入して有望な結果を得ようとしていますが、変圧器の計算が非常に複雑であるため、効率をさらに改善する必要があります。この論文では、LeViT TransformerモジュールをU-Netアーキテクチャに統合するLeViT-UNetを提案し、高速で正確な医療画像セグメンテーションを実現します。具体的には、LeViT-UNetのエンコーダーとしてLeViTを使用します。これにより、Transformerブロックの精度と効率のトレードオフが向上します。さらに、LeViTのトランスフォーマーブロックと畳み込みブロックからのマルチスケール特徴マップは、スキップ接続を介してデコーダーに渡されます。これにより、特徴マップの空間情報を効果的に再利用できます。私たちの実験は、提案されたLeViT-UNetが、シナプスやACDCを含むいくつかの挑戦的な医療画像セグメンテーションベンチマークでのさまざまな競合方法と比較して、より優れたパフォーマンスを達成することを示しています。コードとモデルはhttps://github.com/apple1986/LeViT_UNetで公開されます。

Medical image segmentation plays an essential role in developing computer-assisted diagnosis and therapy systems, yet still faces many challenges. In the past few years, the popular encoder-decoder architectures based on CNNs (e.g., U-Net) have been successfully applied in the task of medical image segmentation. However, due to the locality of convolution operations, they demonstrate limitations in learning global context and long-range spatial relations. Recently, several researchers try to introduce transformers to both the encoder and decoder components with promising results, but the efficiency requires further improvement due to the high computational complexity of transformers. In this paper, we propose LeViT-UNet, which integrates a LeViT Transformer module into the U-Net architecture, for fast and accurate medical image segmentation. Specifically, we use LeViT as the encoder of the LeViT-UNet, which better trades off the accuracy and efficiency of the Transformer block. Moreover, multi-scale feature maps from transformer blocks and convolutional blocks of LeViT are passed into the decoder via skip-connection, which can effectively reuse the spatial information of the feature maps. Our experiments indicate that the proposed LeViT-UNet achieves better performance comparing to various competing methods on several challenging medical image segmentation benchmarks including Synapse and ACDC. Code and models will be publicly available at https://github.com/apple1986/LeViT_UNet.

updated: Mon Jul 19 2021 05:48:51 GMT+0000 (UTC)

published: Mon Jul 19 2021 05:48:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト