TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation

Bingzhi Chen; Yishu Liu; Zheng Zhang; Guangming Lu; Adams Wai Kin Kong

TransAttUnet：医療画像セグメンテーション用のトランスフォーマーを備えたマルチレベルの注意誘導U-Net

医用画像からの臓器または病変の正確なセグメンテーションは、疾患および臓器形態計測の信頼できる診断にとって非常に重要です。近年、畳み込みエンコーダ-デコーダソリューションは、自動医療画像セグメンテーションの分野で大きな進歩を遂げました。畳み込み操作に固有のバイアスがあるため、以前のモデルは主に隣接するピクセルによって形成されるローカルの視覚的手がかりに焦点を当てていますが、長距離のコンテキスト依存関係を完全にモデル化することはできません。この論文では、TransAttUnetと呼ばれる新しいTransformerベースのAttention Guided Networkを提案します。このネットワークでは、マルチレベルのガイド付きアテンションとマルチスケールのスキップ接続が、セマンティックセグメンテーションアーキテクチャのパフォーマンスを共同で強化するように設計されています。 Transformerに触発され、Transformer Self Attention（TSA）とGlobal Spatial Attention（GSA）を備えたSelf-Aware Attention（SAA）モジュールがTransAttUnetに組み込まれ、エンコーダー機能間の非ローカル相互作用を効果的に学習します。さらに、デコーダーブロック間の追加のマルチスケールスキップ接続を使用して、異なるセマンティックスケールでアップサンプリングされた機能を集約します。このようにして、マルチスケールコンテキスト情報の表現能力が強化され、識別可能な特徴が生成されます。これらの補完的なコンポーネントの恩恵を受けて、提案されたTransAttUnetは、畳み込み層の積み重ねと連続したサンプリング操作によって引き起こされる細部の損失を効果的に軽減し、最終的に医用画像のセグメンテーション品質を向上させます。さまざまな画像モダリティからの複数の医用画像セグメンテーションデータセットに関する広範な実験は、提案された方法が常に最先端のベースラインを上回っていることを示しています。コードと事前トレーニング済みモデルは、https：//github.com/YishuLiu/TransAttUnetで入手できます。

Accurate segmentation of organs or lesions from medical images is crucial for reliable diagnosis of diseases and organ morphometry. In recent years, convolutional encoder-decoder solutions have achieved substantial progress in the field of automatic medical image segmentation. Due to the inherent bias in the convolution operations, prior models mainly focus on local visual cues formed by the neighboring pixels, but fail to fully model the long-range contextual dependencies. In this paper, we propose a novel Transformer-based Attention Guided Network called TransAttUnet, in which the multi-level guided attention and multi-scale skip connection are designed to jointly enhance the performance of the semantical segmentation architecture. Inspired by Transformer, the self-aware attention (SAA) module with Transformer Self Attention (TSA) and Global Spatial Attention (GSA) is incorporated into TransAttUnet to effectively learn the non-local interactions among encoder features. Moreover, we also use additional multi-scale skip connections between decoder blocks to aggregate the upsampled features with different semantic scales. In this way, the representation ability of multi-scale context information is strengthened to generate discriminative features. Benefitting from these complementary components, the proposed TransAttUnet can effectively alleviate the loss of fine details caused by the stacking of convolution layers and the consecutive sampling operations, finally improving the segmentation quality of medical images. Extensive experiments on multiple medical image segmentation datasets from different imaging modalities demonstrate that the proposed method consistently outperforms the state-of-the-art baselines. Our code and pre-trained models are available at: https://github.com/YishuLiu/TransAttUnet.

updated: Sat Jul 09 2022 03:28:13 GMT+0000 (UTC)

published: Mon Jul 12 2021 09:17:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト