Deep Model Assembling

Zanlin Ni; Yulin Wang; Jiangwei Yu; Haojun Jiang; Yue Cao; Gao Huang

ディープモデルアセンブル

大規模なディープラーニングモデルは、多くのシナリオで目覚ましい成功を収めています。ただし、大規模なモデルのトレーニングは通常、計算コストが高く、最適化手順が不安定で非常に遅く、オーバーフィッティングに対する脆弱性などの理由で困難です。これらの問題を軽減するために、この作業では分割統治戦略、つまり、大きなモデルを小さなモジュールに分割し、それらを個別にトレーニングし、トレーニング済みのモジュールを再構築してターゲットモデルを取得する方法を研究します。このアプローチは、大規模なモデルをゼロから直接トレーニングすることを回避できるため、有望です。それにもかかわらず、独立してトレーニングされたモジュールの互換性を確保することは難しいため、このアイデアの実装は簡単ではありません。このホワイトペーパーでは、この問題に対処するための洗練されたソリューションを提示します。つまり、グローバルな共有メタモデルを導入して、すべてのモジュールを暗黙的にリンクします。これにより、組み合わせたときに効果的に連携する互換性の高いモジュールをトレーニングできます。さらに、メタモデルを非常に浅いネットワークとして設計できるようにするモジュールインキュベーションメカニズムを提案します。その結果、メタモデルによって導入される追加のオーバーヘッドが最小限に抑えられます。概念的には単純ですが、私たちの方法は、最終的な精度とトレーニング効率の両方の点で、エンドツーエンド (E2E) トレーニングよりも大幅に優れています。たとえば、ViT-Huge に加えて、ImageNet-1K の E2E ベースラインと比較して精度を 2.7% 向上させ、その間にトレーニングコストを 43% 節約します。コードは https://github.com/LeapLabTHU/Model-Assembling で入手できます。

Large deep learning models have achieved remarkable success in many scenarios. However, training large models is usually challenging, e.g., due to the high computational cost, the unstable and painfully slow optimization procedure, and the vulnerability to overfitting. To alleviate these problems, this work studies a divide-and-conquer strategy, i.e., dividing a large model into smaller modules, training them independently, and reassembling the trained modules to obtain the target model. This approach is promising since it avoids directly training large models from scratch. Nevertheless, implementing this idea is non-trivial, as it is difficult to ensure the compatibility of the independently trained modules. In this paper, we present an elegant solution to address this issue, i.e., we introduce a global, shared meta model to implicitly link all the modules together. This enables us to train highly compatible modules that collaborate effectively when they are assembled together. We further propose a module incubation mechanism that enables the meta model to be designed as an extremely shallow network. As a result, the additional overhead introduced by the meta model is minimalized. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% compared to the E2E baseline on ImageNet-1K, while saving the training cost by 43% in the meantime. Code is available at https://github.com/LeapLabTHU/Model-Assembling.

updated: Thu Dec 08 2022 08:04:06 GMT+0000 (UTC)

published: Thu Dec 08 2022 08:04:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト