Improving Network Slimming with Nonconvex Regularization

Kevin Bui; Fredrick Park; Shuai Zhang; Yingyong Qi; Jack Xin

非凸正則化によるネットワークスリミングの改善

畳み込みニューラルネットワーク（CNN）は、オブジェクト検出からセマンティックセグメンテーションに至るまでのさまざまなコンピュータービジョンタスクの強力なモデルになるように開発されました。ただし、最先端のCNNのほとんどは、限られた電力とメモリ帯域幅で低遅延を必要とするスマートフォンやドローンなどのエッジデバイスに直接展開することはできません。 CNNを圧縮するための一般的で直接的なアプローチの1つは、ネットワークスリミングです。これは、トレーニング中にバッチ正規化レイヤーを介してチャネル関連のスケーリング係数にℓ_1正則化を課します。これにより、ネットワークスリミングは、推論のためにプルーニングできる重要でないチャネルを識別します。このホワイトペーパーでは、より圧縮された、および/または正確なCNNアーキテクチャを実現するために、ℓ_1ペナルティを代替の非凸型のスパース性を誘発するペナルティに置き換えることを提案します。 ℓ_p（0 <p <1）、変換されたℓ_1（Tℓ_1）、ミニマックス凹面ペナルティ（MCP）、および圧縮センシングや変数選択。標準の画像分類データセットで、3つのニューラルネットワークアーキテクチャ（VGG-19、DenseNet-40、およびResNet-164）で非凸ペナルティを伴うネットワークスリミングの有効性を示します。数値実験に基づいて、Tℓ_1はチャネルプルーニングに対してモデルの精度を維持し、ℓ_1/ 2、3 / 4はℓ_1と同様の圧縮で再トレーニングした後、より正確なモデルを生成し、MCPとSCADはℓ_1と同様の圧縮で再トレーニングした後により正確なモデルを提供します。 Tℓ_1正則化によるネットワークスリミングは、チャネルプルーニング後のモデル精度を維持しながら、メモリストレージの観点からCNNアーキテクチャを圧縮するという点でネットワークスリミングの最新のベイズ修正よりも優れています。

Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compressing CNNs is network slimming, which imposes ℓ_1 regularization on the channel-associated scaling factors via the batch normalization layers during training. Network slimming thereby identifies insignificant channels that can be pruned for inference. In this paper, we propose replacing the ℓ_1 penalty with an alternative nonconvex, sparsity-inducing penalty in order to yield a more compressed and/or accurate CNN architecture. We investigate ℓ_p (0 < p < 1), transformed ℓ_1 (Tℓ_1), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to their recent successes and popularity in solving sparse optimization problems, such as compressed sensing and variable selection. We demonstrate the effectiveness of network slimming with nonconvex penalties on three neural network architectures -- VGG-19, DenseNet-40, and ResNet-164 -- on standard image classification datasets. Based on the numerical experiments, Tℓ_1 preserves model accuracy against channel pruning, ℓ_1/2, 3/4 yield better compressed models with similar accuracies after retraining as ℓ_1, and MCP and SCAD provide more accurate models after retraining with similar compression as ℓ_1. Network slimming with Tℓ_1 regularization also outperforms the latest Bayesian modification of network slimming in compressing a CNN architecture in terms of memory storage while preserving its model accuracy after channel pruning.

updated: Wed Aug 18 2021 23:51:15 GMT+0000 (UTC)

published: Sat Oct 03 2020 01:04:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト