Label-Efficient Self-Supervised Federated Learning for Tackling Data Heterogeneity in Medical Imaging

Rui Yan; Liangqiong Qu; Qingyue Wei; Shih-Cheng Huang; Liyue Shen; Daniel Rubin; Lei Xing; Yuyin Zhou

医用画像におけるデータの不均一性に対処するためのラベル効率の高い自己管理型フェデレーテッドラーニング

複数の機関からの大規模な医療データセットの収集とキュレーションは、正確な深層学習モデルをトレーニングするために不可欠ですが、プライバシーの問題がデータ共有の妨げになることがよくあります。フェデレーテッドラーニング (FL) は、さまざまな機関間でプライバシーを保護した共同学習を可能にする有望なソリューションですが、一般に、異種のデータ分布と品質ラベル付きデータの欠如によるパフォーマンスの低下に悩まされています。この論文では、医用画像解析のための堅牢でラベル効率の高い自己教師あり FL フレームワークを提示します。私たちの方法は、マスクされた画像モデリングを使用して分散ターゲットタスクデータセットでモデルを直接事前トレーニングする、新しい Transformer ベースの自己教師あり事前トレーニングパラダイムを導入し、異種データでのより堅牢な表現学習とダウンストリームモデルへの効果的な知識転送を促進します。シミュレートされた現実世界の医用画像処理の非 IID フェデレーションデータセットに関する広範な経験的結果は、Transformer を使用したマスクされた画像モデリングが、さまざまな程度のデータの不均一性に対するモデルの堅牢性を大幅に向上させることを示しています。特に、深刻なデータの不均一性の下で、追加の事前トレーニングデータに依存することなく、私たちの方法は、教師ありベースラインと比較して、網膜、皮膚科、および胸部 X 線分類のテスト精度で 5.06%、1.53%、および 4.58% の改善を達成します。 ImageNet 事前トレーニングあり。さらに、既存の FL アルゴリズムと比較して、フェデレーテッド自己教師あり事前トレーニング方法により、分布外のデータによりよく一般化され、限られたラベル付きデータで微調整するときにより効果的に機能するモデルが得られることを示します。コードは https://github.com/rui-yan/SSL-FL で入手できます。

The collection and curation of large-scale medical datasets from multiple institutions is essential for training accurate deep learning models, but privacy concerns often hinder data sharing. Federated learning (FL) is a promising solution that enables privacy-preserving collaborative learning among different institutions, but it generally suffers from performance deterioration due to heterogeneous data distributions and a lack of quality labeled data. In this paper, we present a robust and label-efficient self-supervised FL framework for medical image analysis. Our method introduces a novel Transformer-based self-supervised pre-training paradigm that pre-trains models directly on decentralized target task datasets using masked image modeling, to facilitate more robust representation learning on heterogeneous data and effective knowledge transfer to downstream models. Extensive empirical results on simulated and real-world medical imaging non-IID federated datasets show that masked image modeling with Transformers significantly improves the robustness of models against various degrees of data heterogeneity. Notably, under severe data heterogeneity, our method, without relying on any additional pre-training data, achieves an improvement of 5.06%, 1.53% and 4.58% in test accuracy on retinal, dermatology and chest X-ray classification compared to the supervised baseline with ImageNet pre-training. In addition, we show that our federated self-supervised pre-training methods yield models that generalize better to out-of-distribution data and perform more effectively when fine-tuning with limited labeled data, compared to existing FL algorithms. The code is available at https://github.com/rui-yan/SSL-FL.

updated: Wed Jan 11 2023 07:30:24 GMT+0000 (UTC)

published: Tue May 17 2022 18:33:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト