SplitNet: Divide and Co-training

Shuai Zhao; Liguang Zhou; Wenxiao Wang; Deng Cai; Tin Lun Lam; Yangsheng Xu

SplitNet：分割と共同トレーニング

幅を大きくすると必然的にモデルの容量が増えるため、ニューラルネットワークの幅は重要です。ただし、ネットワークのパフォーマンスは幅に比例して向上せず、すぐに飽和状態になります。この問題に取り組むために、単に幅を拡大するのではなく、ネットワークの数を増やすことを提案します。それを証明するために、1つの大きなネットワークはいくつかの小さなネットワークに分割され、これらの小さなネットワークのそれぞれは元のネットワークの一部を持っています。次に、これらの小さなネットワークを一緒にトレーニングし、同じデータのさまざまなビューを表示して、異なる補完的な知識を学習します。この共同トレーニングプロセス中に、ネットワークは相互に学習することもできます。その結果、小規模なネットワークは、追加のパラメーターやフロップがほとんどまたはまったくない大規模なネットワークよりも優れたアンサンブルパフォーマンスを実現できます。これは、ネットワークの数が、深さ/幅/解像度に加えて、効果的なモデルスケーリングの新しい次元であることを示しています。小規模なネットワークは、異なるデバイスで同時に実行することにより、大規模なネットワークよりも速い推論速度を実現することもできます。広範な実験を通じて、共通のベンチマークでさまざまなネットワークアーキテクチャを使用して、ネットワークの数を増やすことが効果的なモデルスケーリングの新しい次元であるというアイデアを検証します。コードはhttps://github.com/mzhaoshuai/SplitNet-Divide-and-Co-trainingで入手できます。

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. To tackle this problem, we propose to increase the number of networks rather than purely scaling up the width. To prove it, one large network is divided into several small ones, and each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to learn different and complementary knowledge. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs. This reveals that the number of networks is a new dimension of effective model scaling, besides depth/width/resolution. Small networks can also achieve faster inference speed than the large one by concurrent running on different devices. We validate the idea -- increasing the number of networks is a new dimension of effective model scaling -- with different network architectures on common benchmarks through extensive experiments. The code is available at https://github.com/mzhaoshuai/SplitNet-Divide-and-Co-training.

updated: Tue Dec 29 2020 02:41:21 GMT+0000 (UTC)

published: Mon Nov 30 2020 10:03:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト