Modern deep neural networks rely on overparameterization to achieve state-of-the-art generalization. But overparameterized models are computationally expensive. Network pruning is often employed to obtain less demanding models for deployment. Fine-grained pruning removes individual weights in parameter tensors and can achieve a high model compression ratio with little accuracy degradation. However, it introduces irregularity into the computing dataflow and often does not yield improved model inference efficiency in practice. Coarse-grained model pruning, while realizing satisfactory inference speedup through removal of network weights in groups, e.g. an entire filter, often lead to significant accuracy degradation. This work introduces the cross-channel intragroup (CCI) sparsity structure, which can prevent the inference inefficiency of fine-grained pruning while maintaining outstanding model performance. We then present a novel training algorithm designed to perform well under the constraint imposed by the CCI-Sparsity. Through a series of comparative experiments we show that our proposed CCI-Sparsity structure and the corresponding pruning algorithm outperform prior art in inference efficiency by a substantial margin given suited hardware acceleration in the future.
updated: Fri Jun 12 2020 05:29:47 GMT+0000 (UTC)
published: Sat Oct 26 2019 01:03:01 GMT+0000 (UTC)