FedCorr: Multi-Stage Federated Learning for Label Noise Correction

Jingyi Xu; Zihan Chen; Tony Q. S. Quek; Kai Fong Ernest Chong

FedCorr：ラベルノイズ補正のための多段階連合学習

連合学習（FL）は、プライバシーを保護する分散学習パラダイムであり、クライアントがグローバルモデルを共同でトレーニングできるようにします。実際のFL実装では、クライアントデータにラベルノイズが含まれる可能性があり、クライアントごとにラベルノイズレベルが大幅に異なる可能性があります。ラベルノイズに取り組むための集中学習には方法がありますが、FLのクライアントデータセットのサイズとデータプライバシー要件は通常小さいため、このような方法はFL設定の異種ラベルノイズではうまく機能しません。この論文では、クライアントデータのプライバシーを維持しながら、ローカルクライアントのノイズモデルを想定せずに、FLの異種ラベルノイズに取り組むための一般的な多段階フレームワークであるFedCorrを提案します。特に、（1）FedCorrは、すべてのクライアントで個別に測定されたモデル予測部分空間の次元を利用してノイズの多いクライアントを動的に識別し、サンプルごとの損失に基づいてノイズの多いクライアントの誤ったラベルを識別します。データの不均一性に対処し、トレーニングの安定性を高めるために、推定された局所ノイズレベルに基づく適応局所近位正則化項を提案します。（2）識別されたクリーンなクライアントのグローバルモデルをさらに微調整し、微調整後に残りのノイズの多いクライアントのノイズの多いラベルを修正します。（3）最後に、すべてのクライアントに通常のトレーニングを適用して、すべてのローカルデータを最大限に活用します。フェデレーション合成ラベルノイズを使用したCIFAR-10/100と、実際のノイズの多いデータセットであるClothing1Mで実施された実験は、FedCorrがラベルノイズに対して堅牢であり、複数のノイズレベルで最先端の方法を大幅に上回っていることを示しています。

Federated learning (FL) is a privacy-preserving distributed learning paradigm that enables clients to jointly train a global model. In real-world FL implementations, client data could have label noise, and different clients could have vastly different label noise levels. Although there exist methods in centralized learning for tackling label noise, such methods do not perform well on heterogeneous label noise in FL settings, due to the typically smaller sizes of client datasets and data privacy requirements in FL. In this paper, we propose FedCorr, a general multi-stage framework to tackle heterogeneous label noise in FL, without making any assumptions on the noise models of local clients, while still maintaining client data privacy. In particular, (1) FedCorr dynamically identifies noisy clients by exploiting the dimensionalities of the model prediction subspaces independently measured on all clients, and then identifies incorrect labels on noisy clients based on per-sample losses. To deal with data heterogeneity and to increase training stability, we propose an adaptive local proximal regularization term that is based on estimated local noise levels. (2) We further finetune the global model on identified clean clients and correct the noisy labels for the remaining noisy clients after finetuning. (3) Finally, we apply the usual training on all clients to make full use of all local data. Experiments conducted on CIFAR-10/100 with federated synthetic label noise, and on a real-world noisy dataset, Clothing1M, demonstrate that FedCorr is robust to label noise and substantially outperforms the state-of-the-art methods at multiple noise levels.

updated: Sun Apr 10 2022 12:51:18 GMT+0000 (UTC)

published: Sun Apr 10 2022 12:51:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト