DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and Transformers

Changlin Li; Guangrun Wang; Bing Wang; Xiaodan Liang; Zhihui Li; Xiaojun Chang

DS-Net ++：CNNおよびトランスフォーマーでの効率的な推論のための動的ウェイトスライシング

動的ネットワークは、推論中にアーキテクチャを入力に適合させることにより、理論的な計算の複雑さを軽減するという有望な機能を示しています。ただし、実際の実行時間は、スパース性が非効率的であるため、通常、理論上の加速よりも遅れています。ここでは、動的重みスライスと呼ばれるハードウェア効率の高い動的推論体制を検討します。これは、さまざまな難易度の入力のネットワークパラメータの一部を適応的にスライスし、パラメータをハードウェアに静的かつ連続的に格納して、スパース計算の余分な負担を防ぎます。このスキームに基づいて、CNNのフィルター数とCNNとトランスフォーマーの両方の複数の次元をそれぞれ入力に依存して調整することにより、動的スリマブルネットワーク（DS-Net）と動的スライス可能ネットワーク（DS-Net ++）を提示します。サブネットワークの一般性とルーティングの公平性を確保するために、インプレースブートストラップ（IB）、マルチビュー整合性（MvCo）、サンドイッチゲートスパース化（SGS）などのトレーニング手法を使用して、スーパーネットと個別にゲートします。 4つのデータセットと3つの異なるネットワークアーキテクチャでの広範な実験により、私たちの方法は、最先端の静的および動的モデル圧縮方法を大幅に上回っています（最大6.6％）。通常、DS-Net ++は、MobileNet、ResNet-50、およびVision Transformerに比べて2〜4倍の計算削減と、1.62倍の実世界の加速を実現し、ImageNetでの精度の低下は最小限（0.1〜0.3％）です。コードリリース：https：//github.com/changlin31/DS-Net

Dynamic networks have shown their promising capability in reducing theoretical computation complexity by adapting their architectures to the input during inference. However, their practical runtime usually lags behind the theoretical acceleration due to inefficient sparsity. Here, we explore a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels, while keeping parameters stored statically and contiguously in hardware to prevent the extra burden of sparse computation. Based on this scheme, we present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers, respectively. To ensure sub-network generality and routing fairness, we propose a disentangled two-stage optimization scheme with training techniques such as in-place bootstrapping (IB), multi-view consistency (MvCo) and sandwich gate sparsification (SGS) to train supernet and gate separately. Extensive experiments on 4 datasets and 3 different network architectures demonstrate our method consistently outperforms state-of-the-art static and dynamic model compression methods by a large margin (up to 6.6%). Typically, DS-Net++ achieves 2-4x computation reduction and 1.62x real-world acceleration over MobileNet, ResNet-50 and Vision Transformer, with minimal accuracy drops (0.1-0.3%) on ImageNet. Code release: https://github.com/changlin31/DS-Net

updated: Tue Sep 21 2021 09:57:21 GMT+0000 (UTC)

published: Tue Sep 21 2021 09:57:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト