DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation

Yousef Yeganeh; Azade Farshad; Peter Weinberger; Seyed-Ahmad Ahmadi; Ehsan Adeli; Nassir Navab

DIAMANT: 医療画像セグメンテーション用のデュアル Image-Attention Map エンコーダー

純粋な変換器ベースのアーキテクチャは、多くのコンピュータービジョンタスクで有望なパフォーマンスを示しましたが、CNN と変換器ブロックで構成される多くのハイブリッドモデルが、より専門的なタスクに適合するように導入されています。それにもかかわらず、医用画像セグメンテーションにおける CNN と比較して、純粋なトランスベースアーキテクチャとハイブリッドトランスベースアーキテクチャの両方のパフォーマンスが向上しているにもかかわらず、それらの高いトレーニングコストと複雑さにより、実際のシナリオでそれらを使用することは困難になっています。この作業では、純粋な畳み込み層に基づく単純なアーキテクチャを提案し、自己教師ありの事前トレーニング済みビジョントランスフォーマーネットワーク (DINO など) から取得したアテンションマップの視覚化を利用するだけで、複雑なトランスフォーマーベースのネットワークよりも優れたパフォーマンスを発揮できることを示します。計算コストが大幅に削減されます。提案されたアーキテクチャは、2 つのエンコーダブランチで構成され、一方のブランチでは元の画像が入力として使用され、もう一方のブランチでは事前トレーニング済みの DINO モデル (複数のチャネルとして) からの複数の自己注意ヘッドからの同じ画像のアテンションマップの視覚化が行われます。公開されている 2 つの医用画像データセットに関する実験の結果は、提案されたパイプラインが U-Net および最先端の医用画像セグメンテーションモデルよりも優れていることを示しています。

Although purely transformer-based architectures showed promising performance in many computer vision tasks, many hybrid models consisting of CNN and transformer blocks are introduced to fit more specialized tasks. Nevertheless, despite the performance gain of both pure and hybrid transformer-based architectures compared to CNNs in medical imaging segmentation, their high training cost and complexity make it challenging to use them in real scenarios. In this work, we propose simple architectures based on purely convolutional layers, and show that by just taking advantage of the attention map visualizations obtained from a self-supervised pretrained vision transformer network (e.g., DINO) one can outperform complex transformer-based networks with much less computation costs. The proposed architecture is composed of two encoder branches with the original image as input in one branch and the attention map visualizations of the same image from multiple self-attention heads from a pre-trained DINO model (as multiple channels) in the other branch. The results of our experiments on two publicly available medical imaging datasets show that the proposed pipeline outperforms U-Net and the state-of-the-art medical image segmentation models.

updated: Fri Apr 28 2023 00:11:18 GMT+0000 (UTC)

published: Fri Apr 28 2023 00:11:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト