PHTrans: Parallelly Aggregating Global and Local Representations for Medical Image Segmentation

Wentao Liu; Tong Tian; Weijin Xu; Huihua Yang; Xipeng Pan; Songlin Yan; Lemeng Wang

PHTrans：医療画像セグメンテーションのためのグローバル表現とローカル表現の並列集約

コンピュータビジョンにおけるTransformerの成功は、医用画像コミュニティでますます注目を集めています。特に医療画像のセグメンテーションでは、畳み込みニューラルネットワーク（CNN）とTransformerに基づく多くの優れたハイブリッドアーキテクチャが提示され、印象的なパフォーマンスを実現しています。ただし、モジュラーTransformerをCNNに組み込むこれらの方法のほとんどは、その潜在能力を最大限に発揮するのに苦労しています。この論文では、PHTransと呼ばれる医療画像セグメンテーションの新しいハイブリッドアーキテクチャを提案します。これは、主要なビルディングブロックでTransformerとCNNを並列にハイブリッド化して、グローバルおよびローカルの特徴から階層表現を生成し、それらを適応的に集約して、それらの長所を十分に活用してより良いものを取得することを目的としています。セグメンテーションパフォーマンス。具体的には、PHTransはU字型のエンコーダーデコーダー設計に従い、深い段階で並列hybirdモジュールを導入します。ここでは、畳み込みブロックと変更された3D Swin Transformerがローカル機能とグローバル依存関係を個別に学習し、シーケンスからボリュームへの操作によって次元が統合されます。特徴の集約を達成するための出力の。頭蓋骨を超えたマルチアトラスラベリングと自動心臓診断チャレンジデータセットの両方に関する広範な実験結果は、その有効性を裏付けており、常に最先端の方法を上回っています。コードはhttps://github.com/lseventeen/PHTransで入手できます。

The success of Transformer in computer vision has attracted increasing attention in the medical imaging community. Especially for medical image segmentation, many excellent hybrid architectures based on convolutional neural networks (CNNs) and Transformer have been presented and achieve impressive performance. However, most of these methods, which embed modular Transformer into CNNs, struggle to reach their full potential. In this paper, we propose a novel hybrid architecture for medical image segmentation called PHTrans, which parallelly hybridizes Transformer and CNN in main building blocks to produce hierarchical representations from global and local features and adaptively aggregate them, aiming to fully exploit their strengths to obtain better segmentation performance. Specifically, PHTrans follows the U-shaped encoder-decoder design and introduces the parallel hybird module in deep stages, where convolution blocks and the modified 3D Swin Transformer learn local features and global dependencies separately, then a sequence-to-volume operation unifies the dimensions of the outputs to achieve feature aggregation. Extensive experimental results on both Multi-Atlas Labeling Beyond the Cranial Vault and Automated Cardiac Diagnosis Challeng datasets corroborate its effectiveness, consistently outperforming state-of-the-art methods. The code is available at: https://github.com/lseventeen/PHTrans.

updated: Sat Jul 23 2022 13:04:23 GMT+0000 (UTC)

published: Wed Mar 09 2022 08:06:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト