Group channel pruning and spatial attention distilling for object detection

Yun Chu; Pu Li; Yong Bai; Zhuhua Hu; Yongqing Chen; Jiafeng Lu

オブジェクト検出のためのグループチャネルプルーニングと空間注意抽出

ニューラルネットワークの過剰なパラメーター化により、枝刈りや量子化に基づく多くのモデル圧縮方法が登場しました。これらは、モデルのサイズ、パラメータ数、および計算の複雑さを軽減する点で顕著です。ただし、このような方法で圧縮されたモデルのほとんどは特別なハードウェアとソフトウェアのサポートを必要とするため、導入コストが増加します。さらに、これらの方法は主に分類タスクで使用され、検出タスクで直接使用されることはほとんどありません。これらの問題に対処するために、物体検出ネットワークに対して、動的スパーストレーニング、グループチャネルプルーニング、空間アテンション蒸留という 3 段階のモデル圧縮方法を導入しました。まず、ネットワーク内の重要でないチャネルを選択し、スパース性と精度のバランスを維持するために、可変スパースレートを導入する動的スパーストレーニング方法を提案します。スパースレートはネットワークのトレーニングプロセスとともに変化します。。次に、ネットワーク精度に対する枝刈りの影響を軽減するために、グループチャネル枝刈りと呼ばれる新しい枝刈り方法を提案します。特に、フィーチャレイヤーの規模とネットワーク内のモジュール構造の類似性に応じてネットワークを複数のグループに分割し、異なるプルーニングしきい値を使用して各グループのチャネルをプルーニングします。最後に、枝刈りされたネットワークの精度を回復するために、枝刈りされたネットワークに対して改良された知識蒸留方法を使用します。特に、各グループの特定スケールの特徴マップから空間注目情報を抽出するための知識として抽出します。実験では、物体検出ネットワークとして YOLOv4 を、トレーニングデータセットとして PASCAL VOC を使用します。私たちの方法では、モデルのパラメーターが 64.7% 削減され、計算が 34.9% 削減されます。

Due to the over-parameterization of neural networks, many model compression methods based on pruning and quantization have emerged. They are remarkable in reducing the size, parameter number, and computational complexity of the model. However, most of the models compressed by such methods need the support of special hardware and software, which increases the deployment cost. Moreover, these methods are mainly used in classification tasks, and rarely directly used in detection tasks. To address these issues, for the object detection network we introduce a three-stage model compression method: dynamic sparse training, group channel pruning, and spatial attention distilling. Firstly, to select out the unimportant channels in the network and maintain a good balance between sparsity and accuracy, we put forward a dynamic sparse training method, which introduces a variable sparse rate, and the sparse rate will change with the training process of the network. Secondly, to reduce the effect of pruning on network accuracy, we propose a novel pruning method called group channel pruning. In particular, we divide the network into multiple groups according to the scales of the feature layer and the similarity of module structure in the network, and then we use different pruning thresholds to prune the channels in each group. Finally, to recover the accuracy of the pruned network, we use an improved knowledge distillation method for the pruned network. Especially, we extract spatial attention information from the feature maps of specific scales in each group as knowledge for distillation. In the experiments, we use YOLOv4 as the object detection network and PASCAL VOC as the training dataset. Our method reduces the parameters of the model by 64.7 % and the calculation by 34.9%.

updated: Fri Jun 02 2023 13:26:23 GMT+0000 (UTC)

published: Fri Jun 02 2023 13:26:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト