All at Once Network Quantization via Collaborative Knowledge Transfer

Ximeng Sun; Rameswar Panda; Chun-Fu Chen; Naigang Wang; Bowen Pan Kailash Gopalakrishnan; Aude Oliva; Rogerio Feris; Kate Saenko

協調的知識移転によるネットワークの量子化を一度に

ネットワーク量子化は、エッジデバイス上のディープニューラルネットワークを圧縮および加速するために最も広く使用されている方法の1つに急速になりました。既存のアプローチは、一般的なベンチマークデータセットで印象的な結果を提供しますが、通常、量子化プロセスを繰り返し、低精度のネットワークを最初から再トレーニングするため、さまざまなリソースの制約に合わせてさまざまなネットワークが作成されます。これにより、実際にはビット幅の動的な変更がしばしば望まれる多くの実際のアプリケーションでのディープネットワークのスケーラブルな展開が制限されます。オールアットワンス量子化は、推論中に単一のディープネットワークのビット幅を柔軟に調整することでこの問題に対処します。異なるシナリオで即座に適応するために、再トレーニングや個別のモデルを格納するための追加メモリは必要ありません。この論文では、オールアットワンス量子化ネットワークを効率的にトレーニングするための新しい協調的知識伝達アプローチを開発します。具体的には、すべてのビット幅でモデルを共同で最適化しながら、知識を低精度の学生に伝達するための高精度の教師を選択する適応選択戦略を提案します。さらに、知識を効果的に伝達するために、精度の低い学生ネットワークのブロックを精度の高い教師ネットワークの対応するブロックにランダムに置き換えることにより、動的なブロックスワッピング方法を開発します。画像とビデオの両方の分類のためのいくつかの挑戦的で多様なデータセットに関する広範な実験は、最先端の方法に対する提案されたアプローチの有効性をよく示しています。

Network quantization has rapidly become one of the most widely used methods to compress and accelerate deep neural networks on edge devices. While existing approaches offer impressive results on common benchmark datasets, they generally repeat the quantization process and retrain the low-precision network from scratch, leading to different networks tailored for different resource constraints. This limits scalable deployment of deep networks in many real-world applications, where in practice dynamic changes in bit-width are often desired. All at Once quantization addresses this problem, by flexibly adjusting the bit-width of a single deep network during inference, without requiring re-training or additional memory to store separate models, for instant adaptation in different scenarios. In this paper, we develop a novel collaborative knowledge transfer approach for efficiently training the all-at-once quantization network. Specifically, we propose an adaptive selection strategy to choose a high-precision teacher for transferring knowledge to the low-precision student while jointly optimizing the model with all bit-widths. Furthermore, to effectively transfer knowledge, we develop a dynamic block swapping method by randomly replacing the blocks in the lower-precision student network with the corresponding blocks in the higher-precision teacher network. Extensive experiments on several challenging and diverse datasets for both image and video classification well demonstrate the efficacy of our proposed approach over state-of-the-art methods.

updated: Tue Mar 02 2021 03:09:03 GMT+0000 (UTC)

published: Tue Mar 02 2021 03:09:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト