CiT-Net: Convolutional Neural Networks Hand in Hand with Vision Transformers for Medical Image Segmentation

Tao Lei; Rui Sun; Xuan Wang; Yingbo Wang; Xi He; Asoke Nandi

CiT-Net: 畳み込みニューラルネットワークとビジョントランスフォーマーを連携して医療画像セグメンテーションを実現

畳み込みニューラルネットワーク (CNN) と Transformer のハイブリッドアーキテクチャは、医療画像のセグメンテーションに非常に人気があります。ただし、2 つの課題があります。まず、CNN ブランチはバニラ畳み込みを使用してローカル画像特徴をキャプチャできますが、適応特徴学習を達成することはできません。第 2 に、Transformer ブランチはグローバルな特徴をキャプチャできますが、チャネルと次元間の自己注意を無視するため、複雑なコンテンツの画像ではセグメンテーションの精度が低くなります。これらの課題に対処するために、私たちは、医療画像セグメンテーション用のビジョン Transformers (CiT-Net) と連携した畳み込みニューラルネットワークの新しいハイブリッドアーキテクチャを提案します。私たちのネットワークには 2 つの利点があります。まず、動的に変形可能な畳み込みを設計し、それを CNN ブランチに適用します。これにより、固定サイズの畳み込みカーネルによる弱い特徴抽出能力と、異なる入力間でカーネルパラメーターを共有する堅固な設計が克服されます。次に、シフトウィンドウ適応相補的アテンションモジュールとコンパクトな畳み込み投影を設計します。これらを Transformer ブランチに適用して、医療画像の次元を超えた長期依存関係を学習します。実験結果は、当社の CiT-Net が一般的な SOTA 手法よりも優れた医療画像セグメンテーション結果を提供することを示しています。さらに、当社の CiT-Net は、より低いパラメータとより少ない計算コストを必要とし、事前トレーニングに依存しません。コードは https://github.com/SR0920/CiT-Net で公開されています。

The hybrid architecture of convolutional neural networks (CNNs) and Transformer are very popular for medical image segmentation. However, it suffers from two challenges. First, although a CNNs branch can capture the local image features using vanilla convolution, it cannot achieve adaptive feature learning. Second, although a Transformer branch can capture the global features, it ignores the channel and cross-dimensional self-attention, resulting in a low segmentation accuracy on complex-content images. To address these challenges, we propose a novel hybrid architecture of convolutional neural networks hand in hand with vision Transformers (CiT-Net) for medical image segmentation. Our network has two advantages. First, we design a dynamic deformable convolution and apply it to the CNNs branch, which overcomes the weak feature extraction ability due to fixed-size convolution kernels and the stiff design of sharing kernel parameters among different inputs. Second, we design a shifted-window adaptive complementary attention module and a compact convolutional projection. We apply them to the Transformer branch to learn the cross-dimensional long-term dependency for medical images. Experimental results show that our CiT-Net provides better medical image segmentation results than popular SOTA methods. Besides, our CiT-Net requires lower parameters and less computational costs and does not rely on pre-training. The code is publicly available at https://github.com/SR0920/CiT-Net.

updated: Tue Jun 06 2023 03:22:22 GMT+0000 (UTC)

published: Tue Jun 06 2023 03:22:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト