ResViT: Residual vision transformers for multi-modal medical image synthesis

Onat Dalmaz; Mahmut Yurt; Tolga Çukur

ResViT：マルチモーダル医用画像合成用の残留ビジョントランスフォーマー

畳み込みニューラルネットワーク（CNN）バックボーンを備えた生成的敵対的モデルは、最近、多くの医用画像合成タスクの最先端として確立されました。ただし、CNNはコンパクトなフィルターを使用してローカル処理を実行するように設計されており、この誘導バイアスはコンテキスト機能の学習を損ないます。ここでは、畳み込み演算子の精度と敵対的学習のリアリズムとともに、ビジョントランスフォーマーのコンテキスト感度を活用する、医療画像合成のための新しい生成的敵対的アプローチ、ResViTを提案します。} ResViTのジェネレーターは、新しい集約残差トランスフォーマーを含む中央ボトルネックを採用しています（ ART）残留畳み込みモジュールとトランスフォーマーモジュールを相乗的に組み合わせるブロック。 ARTブロックの残りの接続は、キャプチャされた表現の多様性を促進し、チャネル圧縮モジュールはタスク関連の情報を抽出します。計算負荷を軽減するために、ARTブロック間に重み共有戦略が導入されています。ソースとターゲットのモダリティ構成を変えるために個別の合成モデルを再構築する必要をなくすために、統一された実装が導入されています。マルチコントラストMRIおよびMRIからのCT画像で欠落しているシーケンスを合成するための包括的なデモンストレーションが実行されます。私たちの結果は、定性的観察と定量的測定基準の観点から、競合するCNNベースおよびトランスベースの方法に対するResViTの優位性を示しています。

Generative adversarial models with convolutional neural network (CNN) backbones have recently been established as state-of-the-art in numerous medical image synthesis tasks. However, CNNs are designed to perform local processing with compact filters, and this inductive bias compromises learning of contextual features. Here, we propose a novel generative adversarial approach for medical image synthesis, ResViT, that leverages the contextual sensitivity of vision transformers along with the precision of convolution operators and realism of adversarial learning.} ResViT's generator employs a central bottleneck comprising novel aggregated residual transformer (ART) blocks that synergistically combine residual convolutional and transformer modules. Residual connections in ART blocks promote diversity in captured representations, while a channel compression module distills task-relevant information. A weight sharing strategy is introduced among ART blocks to mitigate computational burden. A unified implementation is introduced to avoid the need to rebuild separate synthesis models for varying source-target modality configurations. Comprehensive demonstrations are performed for synthesizing missing sequences in multi-contrast MRI, and CT images from MRI. Our results indicate superiority of ResViT against competing CNN- and transformer-based methods in terms of qualitative observations and quantitative metrics.

updated: Sun Mar 06 2022 11:07:38 GMT+0000 (UTC)

published: Wed Jun 30 2021 12:57:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト