SPCXR: Self-supervised Pretraining using Chest X-rays Towards a Domain Specific Foundation Model

Syed Muhammad Anwar; Abhijeet Parida; Sara Atito; Muhammad Awais; Gustavo Nino; Josef Kitler; Marius George Linguraru

SPCXR: ドメイン固有の基礎モデルに向けた胸部 X 線を使用した自己教師あり事前トレーニング

胸部 X 線 (CXR) は、肺疾患の診断と予後を診断するために広く使用されている画像診断手段です。画像分析タスクはさまざまです。例には、病理検出や肺のセグメンテーションが含まれます。特定のタスク向けに機械学習アルゴリズムが開発されている仕事は数多くあります。最近の重要な例は、CXR データを使用したコロナウイルス感染症 (covid-19) の検出です。ただし、教師あり学習に基づく従来の診断ツール設計方法では、より良い臨床結果を得るために高品質である必要があるトレーニングデータのアノテーションを提供する必要があるという負担がかかります。ここでは、代替ソリューションである新しい自己教師ありパラダイムを提案します。このパラダイムでは、グループマスクされた自己教師ありフレームワークを使用して CXR からの一般表現が学習されます。事前トレーニングされたモデルは、新型コロナウイルス感染症 (covid-19)、肺炎の検出、一般的な健康診断などのドメイン固有のタスクに合わせて微調整されます。同じ事前トレーニングを肺のセグメンテーションタスクにも使用できることを示します。私たちが提案したパラダイムは、複数の下流タスクで堅牢なパフォーマンスを示し、事前トレーニングの成功を示しています。さらに、テスト中に大幅なドリフトが発生したデータに対する事前トレーニング済みモデルのパフォーマンスは、より優れた汎用表現の学習を証明しています。この方法は、独自の小規模小児データセットでの covid-19 検出によってさらに検証されています。教師あり変圧器ベースの方法と比較すると、精度におけるパフォーマンスの向上 (約 25%) が大幅に向上します。これにより、私たちが提案するフレームワークと事前トレーニング戦略の強度と信頼性がさらに高まります。

Chest X-rays (CXRs) are a widely used imaging modality for the diagnosis and prognosis of lung disease. The image analysis tasks vary. Examples include pathology detection and lung segmentation. There is a large body of work where machine learning algorithms are developed for specific tasks. A significant recent example is Coronavirus disease (covid-19) detection using CXR data. However, the traditional diagnostic tool design methods based on supervised learning are burdened by the need to provide training data annotation, which should be of good quality for better clinical outcomes. Here, we propose an alternative solution, a new self-supervised paradigm, where a general representation from CXRs is learned using a group-masked self-supervised framework. The pre-trained model is then fine-tuned for domain-specific tasks such as covid-19, pneumonia detection, and general health screening. We show that the same pre-training can be used for the lung segmentation task. Our proposed paradigm shows robust performance in multiple downstream tasks which demonstrates the success of the pre-training. Moreover, the performance of the pre-trained models on data with significant drift during test time proves the learning of a better generic representation. The methods are further validated by covid-19 detection in a unique small-scale pediatric data set. The performance gain in accuracy (~25%) is significant when compared to a supervised transformer-based method. This adds credence to the strength and reliability of our proposed framework and pre-training strategy.

updated: Thu May 18 2023 08:59:07 GMT+0000 (UTC)

published: Wed Nov 23 2022 13:38:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト