A MAC-less Neural Inference Processor Supporting Compressed, Variable Precision Weights

Vincenzo Liguori

圧縮された可変精度の重みをサポートするMACレスニューラル推論プロセッサ

このホワイトペーパーでは、畳み込みニューラルネットワーク（CNN）を推論するための2つのアーキテクチャを紹介します。どちらのアーキテクチャも、重みのスパース性と圧縮を利用して、計算の複雑さと帯域幅を削減します。最初のアーキテクチャは積和演算（MAC）を使用しますが、ゼロの重みをスキップすることで不要な乗算を回避します。 2番目のアーキテクチャは、リソースを大量に消費するMACをはるかに小さいビット層積和演算（BLMAC）に置き換えることにより、ビット表現のレベルで重みのスパース性を活用します。 BLMACを使用すると、可変サイズの整数や浮動小数点としての可変精度の重みも可能になります。 2番目のアーキテクチャの実装の詳細がいくつか示されています。算術符号化による重みの圧縮、および帯域幅への影響についても説明します。最後に、パスファインダーの設計とさまざまなテクノロジーの実装結果をいくつか示します。

This paper introduces two architectures for the inference of convolutional neural networks (CNNs). Both architectures exploit weight sparsity and compression to reduce computational complexity and bandwidth. The first architecture uses multiply-accumulators (MACs) but avoids unnecessary multiplications by skipping zero weights. The second architecture exploits weight sparsity at the level of their bit representation by substituting resource-intensive MACs with much smaller Bit Layer Multiply Accumulators (BLMACs). The use of BLMACs also allows variable precision weights as variable size integers and even floating points. Some details of an implementation of the second architecture are given. Weight compression with arithmetic coding is also discussed as well as bandwidth implications. Finally, some implementation results for a pathfinder design and various technologies are presented.

updated: Thu Dec 10 2020 23:13:17 GMT+0000 (UTC)

published: Thu Dec 10 2020 23:13:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト