Deep Incubation: Training Large Models by Divide-and-Conquering

Zanlin Ni; Yulin Wang; Jiangwei Yu; Haojun Jiang; Yue Cao; Gao Huang

ディープインキュベーション: 分割統治による大規模モデルのトレーニング

近年、大規模な深層学習モデルが目覚ましい成功を収めています。ただし、これらのモデルのトレーニングは、計算コストが高く、収束が非常に遅く、過剰適合の問題があるため困難です。このホワイトペーパーでは、Deep Incubation を紹介します。これは、大きなモデルを個別にトレーニングしてシームレスに組み立てることができる小さなサブモジュールに分割することで、効率的かつ効果的なトレーニングを可能にする新しいアプローチです。このアイデアを実装するための重要な課題は、独立してトレーニングされたサブモジュールの互換性を確保することです。この問題に対処するために、最初にグローバルな共有メタモデルを導入します。これは、すべてのモジュールを暗黙的にリンクするために活用され、ごくわずかな計算オーバーヘッドで非常に小さなネットワークとして設計できます。次に、各サブモジュールをトレーニングしてメタモデルの対応するコンポーネントを置き換え、特定の学習タスクを達成するモジュールインキュベーションアルゴリズムを提案します。シンプルであるにもかかわらず、私たちのアプローチは、最終的に学習したサブモジュールが組み立てられた後に互いにスムーズに連携できるように、各サブモジュールがターゲットの大規模モデルでの役割を認識することを効果的に促進します。経験的に、私たちの方法は、最終的な精度とトレーニング効率の両方の点で、エンドツーエンド (E2E) トレーニングよりも優れています。たとえば、ViT-Huge に加えて、ImageNet で精度を 2.7% 向上させるか、4 分の 1 のトレーニング時間で同様のパフォーマンスを達成します。特に、ダウンストリームタスク (例: COCO および ADE20K でのオブジェクト検出および画像セグメンテーション) においても、この利点は重要です。コードは https://github.com/LeapLabTHU/Deep-Incubation で入手できます。

Recent years have witnessed a remarkable success of large deep learning models. However, training these models is challenging due to high computational costs, painfully slow convergence, and overfitting issues. In this paper, we present Deep Incubation, a novel approach that enables the efficient and effective training of large models by dividing them into smaller sub-modules that can be trained separately and assembled seamlessly. A key challenge for implementing this idea is to ensure the compatibility of the independently trained sub-modules. To address this issue, we first introduce a global, shared meta model, which is leveraged to implicitly link all the modules together, and can be designed as an extremely small network with negligible computational overhead. Then we propose a module incubation algorithm, which trains each sub-module to replace the corresponding component of the meta model and accomplish a given learning task. Despite the simplicity, our approach effectively encourages each sub-module to be aware of its role in the target large model, such that the finally-learned sub-modules can collaborate with each other smoothly after being assembled. Empirically, our method outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% on ImageNet or achieves similar performance with 4x less training time. Notably, the gains are significant for downstream tasks as well (e.g., object detection and image segmentation on COCO and ADE20K). Code is available at https://github.com/LeapLabTHU/Deep-Incubation.

updated: Thu Mar 16 2023 09:47:53 GMT+0000 (UTC)

published: Thu Dec 08 2022 08:04:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト