Long-Range Zero-Shot Generative Deep Network Quantization

Yan Luo; Yangcheng Gao; Zhao Zhang; Haijun Zhang; Mingliang Xu; Meng Wang

長距離ゼロショットジェネレーティブディープネットワーク量子化

量子化は、推論を高速化し、計算を削減するために、浮動小数点数を使用したディープネットワークモデルをビット幅の少ない数値で近似します。元のデータにアクセスせずにモデルを量子化すると、データ合成によって実際のデータ分布をフィッティングすることで、ゼロショット量子化を実現できます。ただし、ゼロショット量子化は、実際のデータを使用したトレーニング後の量子化に比べてパフォーマンスが劣ります。その理由は次のとおりです。1) 通常のジェネレーターでは、グローバルな特徴に注意を向けるための長距離情報が不足しているため、高度な多様性の合成データを取得するのが困難です。 2) 合成画像は実際のデータの統計をシミュレートすることを目的としているため、クラス内の不均一性が弱くなり、機能の豊富さが制限されます。これらの問題を克服するために、Long-Range Zero-Shot Generative Deep Network Quantization (LRQ) と呼ばれる新しいディープネットワーク量子化器を提案します。技術的には、単純なローカル機能の代わりに長距離情報を学習する長距離ジェネレーターを提案します。合成データがよりグローバルな特徴を含むようにするために、大規模なカーネル畳み込みを使用した長距離アテンションがジェネレーターに組み込まれています。さらに、特徴ベクトルとクラスセンター間のクラス内角度拡大を強制するための Adversarial Margin Add (AMA) モジュールも提示します。 AMA は、元の損失関数のトレーニング目的とは反対に、損失関数の収束の難しさを増加させるため、敵対的なプロセスを形成します。さらに、完全精度ネットワークから知識を転送するために、分離された知識蒸留も利用します。広範な実験により、LRQ が他の競合他社よりも優れたパフォーマンスを発揮することが実証されています。

Quantization approximates a deep network model with floating-point numbers by the one with low bit width numbers, in order to accelerate inference and reduce computation. Quantizing a model without access to the original data, zero-shot quantization can be accomplished by fitting the real data distribution by data synthesis. However, zero-shot quantization achieves inferior performance compared to the post-training quantization with real data. We find it is because: 1) a normal generator is hard to obtain high diversity of synthetic data, since it lacks long-range information to allocate attention to global features; 2) the synthetic images aim to simulate the statistics of real data, which leads to weak intra-class heterogeneity and limited feature richness. To overcome these problems, we propose a novel deep network quantizer, dubbed Long-Range Zero-Shot Generative Deep Network Quantization (LRQ). Technically, we propose a long-range generator to learn long-range information instead of simple local features. In order for the synthetic data to contain more global features, long-range attention using large kernel convolution is incorporated into the generator. In addition, we also present an Adversarial Margin Add (AMA) module to force intra-class angular enlargement between feature vector and class center. As AMA increases the convergence difficulty of the loss function, which is opposite to the training objective of the original loss function, it forms an adversarial process. Furthermore, in order to transfer knowledge from the full-precision network, we also utilize a decoupled knowledge distillation. Extensive experiments demonstrate that LRQ obtains better performance than other competitors.

updated: Sun Nov 13 2022 04:43:52 GMT+0000 (UTC)

published: Sun Nov 13 2022 04:43:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト