Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency

Jie Yang; Ruimao Zhang; Chaoqun Wang; Zhen Li; Xiang Wan; Lingyan Zhang

構造化された意味的一貫性の学習による対になっていないマルチモーダル医療画像セグメンテーションに向けて

マルチモーダルデータを統合して医療画像分析を改善することは、最近大きな注目を集めています。ただし、モーダルの不一致により、単一のモデルを使用して複数のモダリティからのデータを処理する方法は、依然として未解決の問題です。この論文では、対になっていないマルチモーダル医用画像のより良いピクセルレベルのセグメンテーションを実現するための新しいスキームを提案します。共通の意味情報を抽出しながら、異なるモダリティの外観の変化に対応するためにモダリティ固有のモジュールとモダリティ共有モジュールの両方を採用した以前の方法とは異なり、私たちの方法は、慎重に設計された外部注意モジュール（EAM）を備えた単一のトランスフォーマーに基づいていますトレーニングフェーズのモダリティ間の構造化されたセマンティック整合性（つまり、セマンティッククラス表現とそれらの相関）。実際には、モダリティ全体での上記の構造化された意味的一貫性は、モダリティレベルと画像レベルでそれぞれ一貫性の正則化を実装することによって段階的に達成できます。提案されたEAMは、さまざまなスケール表現のセマンティックの一貫性を学習するために採用され、モデルが最適化されると破棄できます。したがって、テストフェーズでは、すべてのモーダル予測に対して1つのトランスフォーマーを維持するだけで済みます。これにより、モデルの使いやすさとシンプルさのバランスがうまく取れます。提案された方法の有効性を実証するために、2つの医療画像セグメンテーションシナリオで実験を行います：（1）心臓構造セグメンテーションと（2）腹部多臓器セグメンテーション。広範な結果は、提案された方法が最先端の方法を大幅に上回り、1つの特定のモダリティの非常に限られたトレーニングサンプル（たとえば、1つまたは3つの注釈付きCTまたはMRI画像）で競争力のあるパフォーマンスを達成することを示しています。

Integrating multi-modal data to improve medical image analysis has received great attention recently. However, due to the modal discrepancy, how to use a single model to process the data from multiple modalities is still an open issue. In this paper, we propose a novel scheme to achieve better pixel-level segmentation for unpaired multi-modal medical images. Different from previous methods which adopted both modality-specific and modality-shared modules to accommodate the appearance variance of different modalities while extracting the common semantic information, our method is based on a single Transformer with a carefully designed External Attention Module (EAM) to learn the structured semantic consistency (i.e. semantic class representations and their correlations) between modalities in the training phase. In practice, the above-mentioned structured semantic consistency across modalities can be progressively achieved by implementing the consistency regularization at the modality-level and image-level respectively. The proposed EAMs are adopted to learn the semantic consistency for different scale representations and can be discarded once the model is optimized. Therefore, during the testing phase, we only need to maintain one Transformer for all modal predictions, which nicely balances the model's ease of use and simplicity. To demonstrate the effectiveness of the proposed method, we conduct the experiments on two medical image segmentation scenarios: (1) cardiac structure segmentation, and (2) abdominal multi-organ segmentation. Extensive results show that the proposed method outperforms the state-of-the-art methods by a wide margin, and even achieves competitive performance with extremely limited training samples (e.g., 1 or 3 annotated CT or MRI images) for one specific modality.

updated: Tue Jun 21 2022 17:50:29 GMT+0000 (UTC)

published: Tue Jun 21 2022 17:50:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト