Feature Map Transform Coding for Energy-Efficient CNN Inference

Brian Chmiel; Chaim Baskin; Ron Banner; Evgenii Zheltonozhskii; Yevgeny Yermolin; Alex Karbachevsky; Alex M. Bronstein; Avi Mendelson

エネルギー効率の良いCNN推論のための機能マップ変換コーディング

畳み込みニューラルネットワーク（CNN）は、コンピュータービジョンなどのさまざまなタスクで最先端の精度を実現します。低電力エッジデバイスの推論にCNNを広く使用することを妨げる主な障害の1つは、計算の複雑さとメモリ帯域幅の要件が高いことです。後者は、多くの場合、最新のハードウェアのエネルギーフットプリントを支配します。この記事では、画像とビデオの圧縮にヒントを得た、損失の多い変換コーディングアプローチを紹介します。これは、中間アクティベーション計算結果の保存によりメモリ帯域幅を削減するように設計されています。この方法では、ネットワークの重みを微調整する必要はなく、高度に相関する機能マップを可変長コーディングで圧縮することにより、データ転送ボリュームをメインメモリに半分にします。私たちの方法は、ResNet-34とMobileNetV2の精度がわずかに低下しますが、値あたりのビット数の点で以前のアプローチより優れています。さまざまなCNNアーキテクチャでのアプローチのパフォーマンスを分析し、ResNet-18のFPGA実装により、量子化されたネットワークと比較して、精度にほとんど影響を与えずに、メモリエネルギーフットプリントを約40％削減できることを示しています。最大2％の精度低下を許容すると、60％の削減が達成されます。リファレンス実装はhttps://github.com/CompressTeam/TransformCodingInferenceで入手できます。

Convolutional neural networks (CNNs) achieve state-of-the-art accuracy in a variety of tasks in computer vision and beyond. One of the major obstacles hindering the ubiquitous use of CNNs for inference on low-power edge devices is their high computational complexity and memory bandwidth requirements. The latter often dominates the energy footprint on modern hardware. In this paper, we introduce a lossy transform coding approach, inspired by image and video compression, designed to reduce the memory bandwidth due to the storage of intermediate activation calculation results. Our method does not require fine-tuning the network weights and halves the data transfer volumes to the main memory by compressing feature maps, which are highly correlated, with variable length coding. Our method outperform previous approach in term of the number of bits per value with minor accuracy degradation on ResNet-34 and MobileNetV2. We analyze the performance of our approach on a variety of CNN architectures and demonstrate that FPGA implementation of ResNet-18 with our approach results in a reduction of around 40% in the memory energy footprint, compared to quantized network, with negligible impact on accuracy. When allowing accuracy degradation of up to 2%, the reduction of 60% is achieved. A reference implementation is available at https://github.com/CompressTeam/TransformCodingInference

updated: Thu Sep 26 2019 14:43:47 GMT+0000 (UTC)

published: Sun May 26 2019 16:29:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト