Surgical Aggregation: A Federated Learning Framework for Harmonizing Distributed Datasets with Diverse Tasks

Pranav Kulkarni; Adway Kanhere; Paul H. Yi; Vishwa S. Parekh

外科的集約: 分散データセットを多様なタスクと調和させるためのフェデレーテッドラーニングフレームワーク

胸部 X 線 (CXR) の AI 支援による特性評価は、多くの臨床アプリケーションで大きなメリットをもたらす可能性があります。多くの大規模なパブリック CXR データセットは、ディープラーニングを使用して異常を検出するためにキュレーションされています。ただし、これらのデータセットはそれぞれ、CXR に存在する可能性のある疾患ラベルのサブセットの検出に焦点を当てているため、臨床的有用性が制限されています。さらに、これらのデータセットの分散性と、データ共有の規制により、疾患ラベルの完全な表現を共有して作成することが難しくなっています。さまざまな疾患ラベルを持つ分散データセットからの知識を「グローバル」深層学習モデルに集約するためのフェデレーテッドラーニングフレームワークである外科的集約を提案します。 NIH 胸部 X 線 14 データセットをトレーニング (70%)、検証 (10%)、およびテスト (20%) 分割に無作為に分割し、患者の重複はなく、2 つの実験を実施しました。最初の実験では、疾患ラベルを刈り込んで、それぞれ 11 個と 8 個のラベルを含む 2 つの「おもちゃ」データセットを作成し、4 つのラベルが重複しています。 2 番目の実験では、病気のラベルを刈り込んで、それぞれ 7 つのラベルを持つ 2 つのばらばらな「おもちゃ」データセットを作成しました。外科的に集約された「グローバル」モデルは、完全な疾患ラベルでトレーニングされた「ベースライン」モデルと比較して、両方の実験で優れたパフォーマンスを発揮することがわかりました。オーバーラップ実験とディスジョイント実験の AUROC はそれぞれ 0.87 と 0.86 で、ベースラインの AUROC は 0.87 でした。外科的集計を使用して、NIH 胸部 X 線 14 および CheXpert データセットを、それぞれ 0.85 および 0.83 の AUROC を持つ「グローバル」モデルに調和させました。私たちの結果は、さまざまなタスクを持つ分散データセットから知識を集約することにより、外科的集約を使用して臨床的に有用な深層学習モデルを開発できることを示しています。これは、ベンチからベッドサイドまでのギャップを埋めるための一歩です。

AI-assisted characterization of chest x-rays (CXR) has the potential to provide substantial benefits across many clinical applications. Many large-scale public CXR datasets have been curated for detection of abnormalities using deep learning. However, each of these datasets focus on detecting a subset of disease labels that could be present in a CXR, thus limiting their clinical utility. Furthermore, the distributed nature of these datasets, along with data sharing regulations, make it difficult to share and create a complete representation of disease labels. We propose surgical aggregation, a federated learning framework for aggregating knowledge from distributed datasets with different disease labels into a 'global' deep learning model. We randomly divided the NIH Chest X-Ray 14 dataset into training (70%), validation (10%), and test (20%) splits with no patient overlap and conducted two experiments. In the first experiment, we pruned the disease labels to create two 'toy' datasets containing 11 and 8 labels respectively with 4 overlapping labels. For the second experiment, we pruned the disease labels to create two disjoint 'toy' datasets with 7 labels each. We observed that the surgically aggregated 'global' model resulted in excellent performance across both experiments when compared to a 'baseline' model trained on complete disease labels. The overlapping and disjoint experiments had an AUROC of 0.87 and 0.86 respectively, compared to the baseline AUROC of 0.87. We used surgical aggregation to harmonize the NIH Chest X-Ray 14 and CheXpert datasets into a 'global' model with an AUROC of 0.85 and 0.83 respectively. Our results show that surgical aggregation could be used to develop clinically useful deep learning models by aggregating knowledge from distributed datasets with diverse tasks, a step forward towards bridging the gap from bench to bedside.

updated: Tue Jan 17 2023 03:53:29 GMT+0000 (UTC)

published: Tue Jan 17 2023 03:53:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト