High Performance Convolution Using Sparsity and Patterns for Inference in Deep Convolutional Neural Networks

Hossam Amer; Ahmed H. Salamah; Ahmad Sajedi; En-hui Yang

深い畳み込みニューラルネットワークにおける推論のためのスパース性とパターンを使用した高性能畳み込み

深い畳み込みニューラルネットワーク（CNN）の展開は、主に畳み込みに起因するメモリフットプリントと速度要件の影響を受けます。広く使用されている畳み込みアルゴリズムであるim2colとMECは、マップのスパース性を考慮せずに、水平および/または垂直のカーネルオーバーラップに含まれるマップの要素を冗長に格納することにより、アクティベーションマップから低行列を生成します。マップのスパース性を使用して、このペーパーでは、圧縮パターンオーバーラップ（CPO）と圧縮パターンセット（CPS）と呼ばれる2つの新しい畳み込みアルゴリズムを提案します。これらのアルゴリズムは、メモリフットプリントを削減し、精度を維持しながら推論速度を向上させます。 CPOは、アクティベーションマップの水平方向と垂直方向のオーバーラップで非ゼロ要素（NZE）を認識します。 CPSは、隣接するNZEのインデックス位置を圧縮することにより、CPOのメモリ節約をさらに改善します。どちらのアルゴリズムでも、すべてゼロのアクティベーションマップのチャネル/領域はスキップされます。次に、CPO / CPSは、スパース表現に対して行われるスパース行列-ベクトル乗算（SpMv）を介して畳み込みを実行します。 CPUで実施された実験結果は、im2colに対して、レイヤーごとの平均時間節約が最大63％に達し、圧縮率（CR）が最大26倍になることを示しています。一部のレイヤーでは、MECの並列実装よりも、レイヤーあたりの平均CPO / CPS時間の節約が28％向上し、CRが9.2倍向上しています。特定のCNNの推論に対して、CPOまたはCPSとim2colの間の時間の観点から、各畳み込み層に対して最適な畳み込みアルゴリズムをオフラインで選択します。私たちのアルゴリズムは、非点ごとの畳み込み層の最大56％まで選択されました。オフラインでの選択により、CNN推論時間を最大9％、CRを最大10倍節約できます。

Deploying deep Convolutional Neural Networks (CNNs) is impacted by their memory footprint and speed requirements, which mainly come from convolution. Widely-used convolution algorithms, im2col and MEC, produce a lowered matrix from an activation map by redundantly storing the map's elements included at horizontal and/or vertical kernel overlappings without considering the sparsity of the map. Using the sparsity of the map, this paper proposes two new convolution algorithms dubbed Compressed Pattern Overlap (CPO) and Compressed Pattern Sets (CPS) that simultaneously decrease the memory footprint and increase the inference speed while preserving the accuracy. CPO recognizes non-zero elements (NZEs) at horizontal and vertical overlappings in the activation maps. CPS further improves the memory savings of CPO by compressing the index positions of neighboring NZEs. In both algorithms, channels/regions of the activation maps with all zeros are skipped. Then, CPO/CPS performs convolution via Sparse Matrix-Vector Multiplication (SpMv) done on their sparse representations. Experimental results conducted on CPUs show that average per-layer time savings reach up to 63% and Compression Ratio (CR) up to 26x with respect to im2col. In some layers, our average per layer CPO/CPS time savings are better by 28% and CR is better by 9.2x than the parallel implementation of MEC. For a given CNN's inference, we offline select for each convolution layer the best convolutional algorithm in terms of time between either CPO or CPS and im2col. Our algorithms were selected up to 56% of the non-pointwise convolutional layers. Our offline selections yield CNN inference time savings up to 9% and CR up to 10x.

updated: Fri Apr 16 2021 18:55:32 GMT+0000 (UTC)

published: Fri Apr 16 2021 18:55:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト