On the Importance and Applicability of Pre-Training for Federated Learning

Hong-You Chen; Cheng-Hao Tu; Ziwei Li; Han-Wei Shen; Wei-Lun Chao

フェデレーテッドラーニングのためのプレトレーニングの重要性と適用性について

今日の深層学習では、学習したモデルのパフォーマンスを向上させるために、事前トレーニングが一般的です。ただし、連合学習 (FL) に関する文献では、ニューラルネットワークはほとんどランダムな重みで初期化されます。これらは、FLの事前トレーニングを調査するための体系的な研究の実施に私たちの関心を引き付けます.複数の視覚認識ベンチマーク全体で、事前トレーニングは FL を改善するだけでなく、特に非 IID クライアントのデータの困難なケースで、対応する集中学習との精度ギャップを縮めることができることがわかりました。事前トレーニングされたモデルが直接利用できない状況に調査結果を適用できるようにするために、合成データまたはクライアントのデータを使用した事前トレーニングを分散方式で調査し、それらがすでに FL を著しく改善できることを発見しました。興味深いことに、私たちが探求している技術の多くは、パフォーマンスをさらに向上させるために互いに補完し合っています。これは、実世界のアプリケーション向けにディープ FL をスケールアップするための重要な結果であると考えています。 FL に対する事前トレーニングの効果を理解する試みで、私たちの論文を締めくくります。事前トレーニングにより、さまざまなクライアントのデータ条件下で学習したグローバルモデルが同じ損失盆地に収束し、FL でのグローバルアグリゲーションがより安定することがわかりました。それにもかかわらず、事前トレーニングは、非 IID データの下での FL の根本的な問題であるローカルモデルのドリフトを緩和しないようです。

Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients' data. To make our findings applicable to situations where pre-trained models are not directly available, we explore pre-training with synthetic data or even with clients' data in a decentralized manner, and found that they can already improve FL notably. Interestingly, many of the techniques we explore are complementary to each other to further boost the performance, and we view this as a critical result toward scaling up deep FL for real-world applications. We conclude our paper with an attempt to understand the effect of pre-training on FL. We found that pre-training enables the learned global models under different clients' data conditions to converge to the same loss basin, and makes global aggregation in FL more stable. Nevertheless, pre-training seems to not alleviate local model drifting, a fundamental problem in FL under non-IID data.

updated: Thu Mar 23 2023 03:27:40 GMT+0000 (UTC)

published: Thu Jun 23 2022 06:02:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト