Preservation of the Global Knowledge by Not-True Self Knowledge Distillation in Federated Learning

Gihun Lee; Yongjin Shin; Minchan Jeong; Se-Young Yun

統合学習における非真の自己知識蒸留によるグローバル知識の保存

Federated Learning (FL) では、クライアントのローカルでトレーニングされたモデルを集約することにより、強力なグローバルモデルが共同で学習されます。これにより、クライアントのデータに直接アクセスする必要がなくなりますが、グローバルモデルの収束は、多くの場合、データの異質性に影響を受けます。この論文は、忘却がグローバルな収束のボトルネックになる可能性があることを示唆しています。偏った局所分布に当てはめると、グローバル分布の特徴がシフトし、グローバルな知識が忘れられることがわかります。私たちは、この現象を継続的学習と類似していると考えています。継続的学習は、新しいタスク分布に当てはめたときに壊滅的な忘却にも直面します。調査結果に基づいて、ローカルトレーニングで忘却に取り組むことで、データの不均一性の問題が再発すると仮定します。この目的のために、ローカルで入手可能なデータに関するグローバルな知識を利用する、シンプルで効果的なフレームワーク Federated Local Self-Distillation (FedLSD) を提案します。ローカルデータのグローバルな視点に従うことにより、FedLSD は、学習された機能がグローバルな知識を保持し、ローカルモデル全体で一貫したビューを持つことを奨励し、データのプライバシーを損なうことなく収束を改善します。私たちのフレームワークでは、FedLSD を FedLS-NTD にさらに拡張します。FedLS-NTD は、グローバルモデルのノイズの多い予測を補うために、真でないクラスの信号のみを考慮します。 FedLSD と FedLS-NTD の両方が、さまざまなセットアップ、特に極端なデータの不均一性のケースで、標準の FL ベンチマークのパフォーマンスを大幅に改善することを検証します。

In Federated Learning (FL), a strong global model is collaboratively learned by aggregating the clients' locally trained models. Although this allows no need to access clients' data directly, the global model's convergence often suffers from data heterogeneity. This paper suggests that forgetting could be the bottleneck of global convergence. We observe that fitting on biased local distribution shifts the feature on global distribution and results in forgetting of global knowledge. We consider this phenomenon as an analogy to Continual Learning, which also faces catastrophic forgetting when fitted on the new task distribution. Based on our findings, we hypothesize that tackling down the forgetting in local training relives the data heterogeneity problem. To this end, we propose a simple yet effective framework Federated Local Self-Distillation (FedLSD), which utilizes the global knowledge on locally available data. By following the global perspective on local data, FedLSD encourages the learned features to preserve global knowledge and have consistent views across local models, thus improving convergence without compromising data privacy. Under our framework, we further extend FedLSD to FedLS-NTD, which only considers the not-true class signals to compensate noisy prediction of the global model. We validate that both FedLSD and FedLS-NTD significantly improve the performance in standard FL benchmarks in various setups, especially in the extreme data heterogeneity cases.

updated: Sun Jun 06 2021 11:51:47 GMT+0000 (UTC)

published: Sun Jun 06 2021 11:51:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト