An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

Liangqiong Qu; Niranjan Balachandar; Daniel L Rubin

医用画像のための連合学習法におけるデータの不均一性の実験的研究

連合学習により、複数の教育機関が、プライバシーを保護する方法で、ローカルデータに関する機械学習モデルを共同でトレーニングできます。ただし、その分散性により、多くの場合、機関間のデータ分散に重大な異質性が生じます。この論文では、データの不均一性レジームの分類法が、量の偏り、ラベルの分布の偏り、画像の取得の偏りなど、連合学習方法に及ぼす悪影響を調査します。データの不均一性の度合いが増すと、パフォーマンスが低下することを示します。データ量スキューの加重平均、加重損失、ラベル分布スキューのバッチ正規化平均など、データの不均一性によるパフォーマンスの低下を克服するためのいくつかの緩和戦略を紹介します。連合学習手法に対して提案された最適化は、機関間の異質性を処理する能力を向上させ、実際の臨床アプリケーションでの連合学習の展開に貴重なガイダンスを提供します。

Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way. However, its distributed nature often leads to significant heterogeneity in data distributions across institutions. In this paper, we investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew. We show that the performance degrades with the increasing degrees of data heterogeneity. We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew. The proposed optimizations to federated learning methods improve their capability of handling heterogeneity across institutions, which provides valuable guidance for the deployment of federated learning in real clinical applications.

updated: Sun Jul 18 2021 05:47:48 GMT+0000 (UTC)

published: Sun Jul 18 2021 05:47:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト