Efficiently Training Vision Transformers on Structural MRI Scans for Alzheimer's Disease Detection

Nikhil J. Dhinagar; Sophia I. Thomopoulos; Emily Laltoo; Paul M. Thompson

アルツハイマー病検出のための構造 MRI スキャンでの視覚変換器の効率的なトレーニング

大規模な集団の神経画像は、脳疾患を促進または抵抗する要因を特定し、診断、サブタイピング、および予後を支援するのに役立ちます。畳み込みニューラルネットワーク (CNN) などのデータ駆動型モデルは、ロバストな機能を学習することによって診断および予後タスクを実行するために脳画像にますます適用されています。ディープラーニングアーキテクチャの新しいクラスであるビジョントランスフォーマー (ViT) は、いくつかのコンピュータービジョンアプリケーションの CNN に代わるものとして近年登場しています。ここでは、難易度に基づいた一連の望ましいニューロイメージングダウンストリームタスク、この場合は 3D 脳 MRI に基づく性別およびアルツハイマー病 (AD) の分類について、ViT アーキテクチャのバリアントをテストしました。私たちの実験では、2 つのビジョントランスフォーマーアーキテクチャのバリアントが、それぞれ性別で 0.987、AD 分類で 0.892 の AUC を達成しました。 2 つのベンチマーク AD データセットのデータに基づいてモデルを個別に評価しました。合成 (潜在拡散モデルによって生成された) および実際の MRI スキャンで事前トレーニングされたビジョントランスフォーマーモデルを微調整すると、それぞれ 5% および 9 ～ 10% のパフォーマンス向上を達成しました。私たちの主な貢献には、神経画像領域に関連する、プレトレーニング、データ拡張、学習率のウォームアップとそれに続くアニーリングなど、さまざまな ViT トレーニング戦略の効果をテストすることが含まれます。これらの手法は、通常、トレーニングデータが限られているニューロイメージングアプリケーション用の ViT のようなモデルをトレーニングするために不可欠です。また、データモデルのスケーリング曲線を介して、ViT のテスト時間のパフォーマンスに使用されるトレーニングデータの量の影響を分析しました。

Neuroimaging of large populations is valuable to identify factors that promote or resist brain disease, and to assist diagnosis, subtyping, and prognosis. Data-driven models such as convolutional neural networks (CNNs) have increasingly been applied to brain images to perform diagnostic and prognostic tasks by learning robust features. Vision transformers (ViT) - a new class of deep learning architectures - have emerged in recent years as an alternative to CNNs for several computer vision applications. Here we tested variants of the ViT architecture for a range of desired neuroimaging downstream tasks based on difficulty, in this case for sex and Alzheimer's disease (AD) classification based on 3D brain MRI. In our experiments, two vision transformer architecture variants achieved an AUC of 0.987 for sex and 0.892 for AD classification, respectively. We independently evaluated our models on data from two benchmark AD datasets. We achieved a performance boost of 5% and 9-10% upon fine-tuning vision transformer models pre-trained on synthetic (generated by a latent diffusion model) and real MRI scans, respectively. Our main contributions include testing the effects of different ViT training strategies including pre-training, data augmentation and learning rate warm-ups followed by annealing, as pertaining to the neuroimaging domain. These techniques are essential for training ViT-like models for neuroimaging applications where training data is usually limited. We also analyzed the effect of the amount of training data utilized on the test-time performance of the ViT via data-model scaling curves.

updated: Tue Mar 14 2023 20:18:12 GMT+0000 (UTC)

published: Tue Mar 14 2023 20:18:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト