Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

Yannan Nellie Wu; Po-An Tsai; Angshuman Parashar; Vivienne Sze; Joel S. Emer

Sparseloop：スパーステンソルアクセラレータモデリングへの分析的アプローチ

近年、スパーステンソル代数アプリケーション（スパースニューラルネットワークなど）を効率的に処理するために、多くの加速器が提案されています。ただし、これらの提案は、大きくて多様なデザインスペースにおける単一のポイントです。これらのスパーステンソルアクセラレータの体系的な記述とモデリングのサポートがないため、ハードウェア設計者は効率的かつ効果的な設計空間の探索を行うことができません。このホワイトペーパーでは、最初に、多様なスパーステンソルアクセラレータの設計空間を体系的に説明するための統一された分類法を示します。次に、提案された分類法に基づいて、スパーステンソルアクセラレータの初期段階の評価と探索を可能にする最初の高速で正確かつ柔軟な分析モデリングフレームワークであるSparseloopを紹介します。 Sparseloopは、さまざまなデータフローやスパースアクセラレーション機能（ゼロベースの計算の排除など）を含む、アーキテクチャ仕様の大規模なセットを理解します。これらの仕様を使用して、Sparseloopは、データの移動と、採用されたデータフローによって発生する計算、および確率的テンソル密度モデルを使用したスパースアクセラレーション機能によってもたらされる節約とオーバーヘッドを考慮しながら、設計の処理速度とエネルギー効率を評価します。代表的なアクセラレータとワークロード全体で、Sparseloopはサイクルレベルのシミュレーションよりも2000倍以上速いモデリング速度を達成し、相対的なパフォーマンストレンドを維持し、0.1％から8％の平均エラーを達成します。ケーススタディを使用して、スパーステンソルアクセラレータを設計するための重要な洞察を明らかにするのに役立つSparseloopの機能を示します（たとえば、直交設計の側面を共同設計することが重要です）。

In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of systematic description and modeling support for these sparse tensor accelerators impedes hardware designers from efficient and effective design space exploration. This paper first presents a unified taxonomy to systematically describe the diverse sparse tensor accelerator design space. Based on the proposed taxonomy, it then introduces Sparseloop, the first fast, accurate, and flexible analytical modeling framework to enable early-stage evaluation and exploration of sparse tensor accelerators. Sparseloop comprehends a large set of architecture specifications, including various dataflows and sparse acceleration features (e.g., elimination of zero-based compute). Using these specifications, Sparseloop evaluates a design's processing speed and energy efficiency while accounting for data movement and compute incurred by the employed dataflow as well as the savings and overhead introduced by the sparse acceleration features using stochastic tensor density models. Across representative accelerators and workloads, Sparseloop achieves over 2000 times faster modeling speed than cycle-level simulations, maintains relative performance trends, and achieves 0.1% to 8% average error. With a case study, we demonstrate Sparseloop's ability to help reveal important insights for designing sparse tensor accelerators (e.g., it is important to co-design orthogonal design aspects).

updated: Thu May 12 2022 01:28:03 GMT+0000 (UTC)

published: Thu May 12 2022 01:28:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト