Post-Training Quantization for Object Detection

Lin Niu; Jiawei Liu; Zhihang Yuan; Dawei Yang; Xinggang Wang; Wenyu Liu

オブジェクト検出のためのトレーニング後の量子化

オブジェクト検出ネットワークの効率的な推論は、エッジデバイスの主要な課題です。完全精度のモデルを低ビット幅に直接変換するトレーニング後の量子化 (PTQ) は、モデルの推論の複雑さを軽減するための効果的で便利なアプローチです。ただし、オブジェクト検出などの複雑なタスクに適用すると、精度が大幅に低下します。 PTQ は、量子化の摂動を最小限に抑えるために、さまざまなメトリックによって量子化パラメーターを最適化します。量子化前後の特徴マップの p ノルム距離 Lp は、摂動を評価するためのメトリックとして広く使用されています。オブジェクト検出ネットワークの特殊性については、Lp メトリックのパラメーター p がその量子化パフォーマンスに大きく影響することがわかります。固定ハイパーパラメータ p を使用しても、最適な量子化パフォーマンスが得られないことを示します。この問題を軽減するために、オブジェクト検出のタスク損失を表すオブジェクト検出出力損失 (ODO) を使用して、異なるレイヤーを量子化するための異なる p 値を割り当てるフレームワーク DetPTQ を提案します。 DetPTQ は、ODOL ベースの適応 Lp メトリックを使用して、最適な量子化パラメーターを選択します。実験では、2D および 3D オブジェクト検出器の両方で、DetPTQ が最先端の PTQ メソッドよりも大幅に優れていることが示されています。たとえば、RetinaNet-ResNet18 では 4 ビットの重みと 4 ビットのアクティベーションで 31.1/31.7 (量子化/完全精度) の mAP を実現しています。

Efficient inference for object detection networks is a major challenge on edge devices. Post-Training Quantization (PTQ), which transforms a full-precision model into low bit-width directly, is an effective and convenient approach to reduce model inference complexity. But it suffers severe accuracy drop when applied to complex tasks such as object detection. PTQ optimizes the quantization parameters by different metrics to minimize the perturbation of quantization. The p-norm distance of feature maps before and after quantization, Lp, is widely used as the metric to evaluate perturbation. For the specialty of object detection network, we observe that the parameter p in Lp metric will significantly influence its quantization performance. We indicate that using a fixed hyper-parameter p does not achieve optimal quantization performance. To mitigate this problem, we propose a framework, DetPTQ, to assign different p values for quantizing different layers using an Object Detection Output Loss (ODOL), which represents the task loss of object detection. DetPTQ employs the ODOL-based adaptive Lp metric to select the optimal quantization parameters. Experiments show that our DetPTQ outperforms the state-of-the-art PTQ methods by a significant margin on both 2D and 3D object detectors. For example, we achieve 31.1/31.7(quantization/full-precision) mAP on RetinaNet-ResNet18 with 4-bit weight and 4-bit activation.

updated: Wed Apr 19 2023 16:11:21 GMT+0000 (UTC)

published: Wed Apr 19 2023 16:11:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト