Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration

Reena Elangovan; Shubham Jain; Anand Raghunathan

Ax-BxP：精度を再構成可能なディープニューラルネットワークアクセラレーションのための近似ブロック計算

精密スケーリングは、ディープニューラルネットワーク（DNN）の計算とストレージの要件を最適化するための一般的な手法として登場しました。超低精度（サブ8ビット）DNNの作成に向けた取り組みは、特定のネットワークレベルの精度を達成するために必要な最小精度がネットワーク間、さらにはネットワーク内のレイヤー間でも大幅に異なることを示唆しており、DNNでの可変精度のサポートが必要です。ハードウェア。ビットシリアルハードウェアなどの以前の提案では、オーバーヘッドが高くなり、精度が低いという利点が大幅に減少します。 DNNアクセラレータの精度の再構成可能性を効率的にサポートするために、DNN計算がブロック単位で実行され（ブロックはビットのグループ）、再構成可能性がブロックの粒度でサポートされる近似計算方法を導入します。ブロック単位の計算の結果は、効率的な再構成を可能にするために、おおよその方法で構成されます。近似ブロック計算を具体化するDNNアクセラレータを設計し、特定のDNNに適した近似構成を決定する方法を提案します。 DNN間で近似構成を変更することにより、8ビット固定小数点（FxP8）ベースラインを超えて、システムのエネルギーとパフォーマンスがそれぞれ1.11x-1.34xと1.29x-1.6x向上し、分類精度の低下はごくわずかです。さらに、DNN内のレイヤーとデータ構造全体で近似構成を変更することにより、システムのエネルギーとパフォーマンスをそれぞれ1.14x-1.67xと1.31x-1.93x向上させ、精度の低下はごくわずかです。

Precision scaling has emerged as a popular technique to optimize the compute and storage requirements of Deep Neural Networks (DNNs). Efforts toward creating ultra-low-precision (sub-8-bit) DNNs suggest that the minimum precision required to achieve a given network-level accuracy varies considerably across networks, and even across layers within a network, requiring support for variable precision in DNN hardware. Previous proposals such as bit-serial hardware incur high overheads, significantly diminishing the benefits of lower precision. To efficiently support precision re-configurability in DNN accelerators, we introduce an approximate computing method wherein DNN computations are performed block-wise (a block is a group of bits) and re-configurability is supported at the granularity of blocks. Results of block-wise computations are composed in an approximate manner to enable efficient re-configurability. We design a DNN accelerator that embodies approximate blocked computation and propose a method to determine a suitable approximation configuration for a given DNN. By varying the approximation configurations across DNNs, we achieve 1.11x-1.34x and 1.29x-1.6x improvement in system energy and performance respectively, over an 8-bit fixed-point (FxP8) baseline, with negligible loss in classification accuracy. Further, by varying the approximation configurations across layers and data-structures within DNNs, we achieve 1.14x-1.67x and 1.31x-1.93x improvement in system energy and performance respectively, with negligible accuracy loss.

updated: Wed Nov 25 2020 20:00:38 GMT+0000 (UTC)

published: Wed Nov 25 2020 20:00:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト