Pyramid Medical Transformer for Medical Image Segmentation

Zhuangzhuang Zhang; Baozhou Sun; Weixiong Zhang

医療画像セグメンテーションのためのピラミッド医療変圧器

ディープニューラルネットワークは、医用画像処理の分野で普及している手法です。ただし、医療画像セグメンテーションの最も一般的な畳み込みニューラルネットワーク（CNN）ベースの方法は、レイヤーをスタックしたりフィルターを拡大したりすることで長距離の依存関係をモデル化するため、不完全です。トランスフォーマーと自己注意メカニズムは、位置に関係なく単語間の注意のすべてのペアをモデル化することにより、長距離の依存関係を効果的に学習するために最近提案されています。このアイデアは、画像パッチを埋め込みとして作成および処理することにより、コンピュータビジョンの分野にも拡張されています。画像全体の自己注意の計算の複雑さを考慮すると、変流器ベースのモデルは、有益な関係を失う可能性のある厳密な分割スキームに落ち着きます。さらに、現在の医療用トランスフォーマーは、フル解像度の画像でグローバルコンテキストをモデル化するため、不要な計算コストが発生します。これらの問題に対処するために、ピラミッド型ネットワークアーキテクチャ、つまりPyramid Medical Transformer（PMTrans）を使用して、マルチスケール注意とCNN特徴抽出を統合する新しい方法を開発しました。 PMTransは、多重解像度画像を処理することにより、マルチレンジ関係をキャプチャしました。有益な関係を保持し、さまざまな受容野に効率的にアクセスするために、適応分割スキームが実装されました。 3つの医療画像データセット（腺セグメンテーション、MoNuSeg、およびHECKTORデータセット）の実験結果は、PMTransが医療画像セグメンテーションの最新のCNNベースおよびトランスフォーマーベースのモデルを上回っていることを示しました。

Deep neural networks have been a prevailing technique in the field of medical image processing. However, the most popular convolutional neural networks (CNNs) based methods for medical image segmentation are imperfect because they model long-range dependencies by stacking layers or enlarging filters. Transformers and the self-attention mechanism are recently proposed to effectively learn long-range dependencies by modeling all pairs of word-to-word attention regardless of their positions. The idea has also been extended to the computer vision field by creating and treating image patches as embeddings. Considering the computation complexity for whole image self-attention, current transformer-based models settle for a rigid partitioning scheme that potentially loses informative relations. Besides, current medical transformers model global context on full resolution images, leading to unnecessary computation costs. To address these issues, we developed a novel method to integrate multi-scale attention and CNN feature extraction using a pyramidal network architecture, namely Pyramid Medical Transformer (PMTrans). The PMTrans captured multi-range relations by working on multi-resolution images. An adaptive partitioning scheme was implemented to retain informative relations and to access different receptive fields efficiently. Experimental results on three medical image datasets (gland segmentation, MoNuSeg, and HECKTOR datasets) showed that PMTrans outperformed the latest CNN-based and transformer-based models for medical image segmentation.

updated: Mon Sep 13 2021 04:38:42 GMT+0000 (UTC)

published: Thu Apr 29 2021 23:57:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト