Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training

Shuai Zhao; Liguang Zhou; Wenxiao Wang; Deng Cai; Tin Lun Lam; Yangsheng Xu

精度と効率のトレードオフの改善に向けて: 分割と共同トレーニング

ニューラルネットワークの幅は重要です。幅を大きくすると必然的にモデルの容量が増えるからです。ただし、ネットワークのパフォーマンスは幅に比例して向上するわけではなく、すぐに飽和してしまいます。この場合、ネットワーク (アンサンブル) の数を増やすと、純粋に幅を増やすよりも精度と効率のトレードオフを改善できると主張します。それを証明するために、1 つの大きなネットワークが、そのパラメーターと正則化コンポーネントに関していくつかの小さなネットワークに分割されます。これらの小さなネットワークのそれぞれは、元のネットワークのパラメーターの一部を持っています。次に、これらの小さなネットワークを一緒にトレーニングし、同じデータのさまざまなビューを表示して、多様性を高めます。この共同トレーニングプロセス中に、ネットワークは互いに学習することもできます。その結果、小さなネットワークは、余分なパラメーターや FLOP がほとんどまたはまったくない大きなネットワークよりも優れたアンサンブルパフォーマンスを実現できます。つまり、精度と効率のトレードオフが向上します。また、小規模なネットワークは、同時実行によって大規模なネットワークよりも高速な推論速度を実現できます。上記のすべては、ネットワークの数がモデルのスケーリングの新しい次元であることを示しています。広範な実験を通じて、共通のベンチマークで 8 つの異なるニューラルアーキテクチャを使用して、私たちの主張を検証します。コードは https://github.com/FreeformRobotics/Divide-and-Co-training で入手できます。

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, i.e. , achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments. The code is available at https://github.com/FreeformRobotics/Divide-and-Co-training.

updated: Tue Sep 06 2022 03:15:23 GMT+0000 (UTC)

published: Mon Nov 30 2020 10:03:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト