Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training

Jeya Maria Jose Valanarasu; Yucheng Tang; Dong Yang; Ziyue Xu; Can Zhao; Wenqi Li; Vishal M. Patel; Bennett Landman; Daguang Xu; Yufan He; Vishwesh Nath

破壊的なオートエンコーダ: 3D 医用画像の事前トレーニングのための低レベル機能の活用

ImageNet のような大規模なデータセットでの事前トレーニングの力を利用することは、コンピュータービジョンにおける表現学習主導のソリューションの進歩のための基本的な構成要素を形成します。医療画像は、多くのモダリティ (CT、MR、PET、超音波など) の形式で取得され、組織、病変、臓器などの細分化された情報が含まれているため、本質的に自然画像とは異なります。医療画像のこれらの特性には、次のような特別な注意が必要です。ローカルコンテキストを表す学習特徴。この研究では、3D 放射線画像の効果的な事前トレーニングフレームワークの設計に焦点を当てます。まず、ローカルマスキングと呼ばれる新しいマスキング戦略を提案します。この戦略では、ローカル特徴表現の学習を向上させるために、トークンではなくチャネル埋め込み全体にわたってマスキングが実行されます。これをノイズの追加やダウンサンプリングなどの古典的な低レベルの摂動と組み合わせて、低レベル表現の学習をさらに可能にします。この目的を達成するために、ローカルマスキングと低レベルの摂動の組み合わせによって作成された混乱から元の画像を再構築しようとする事前トレーニングフレームワークである Disruptive Autoencoders を導入します。さらに、単一のフレームワークで複数のモダリティの事前トレーニングに対応するクロスモーダル造影損失 (CMCL) も考案しました。当社は、3D 医療放射線画像 (MRI および CT) の事前トレーニングを可能にする大規模なデータセットを厳選しています。提案された事前トレーニングフレームワークは、複数の下流タスクにわたってテストされ、最先端のパフォーマンスを実現します。特に、私たちが提案した方法は、BTCV多臓器セグメンテーションチャレンジの公開テストリーダーボードでトップになっています。

Harnessing the power of pre-training on large-scale datasets like ImageNet forms a fundamental building block for the progress of representation learning-driven solutions in computer vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical images require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology images. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. Additionally, we also devise a cross-modal contrastive loss (CMCL) to accommodate the pre-training of multiple modalities in a single framework. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation challenge.

updated: Mon Jul 31 2023 17:59:42 GMT+0000 (UTC)

published: Mon Jul 31 2023 17:59:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト