MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Abhiroop Bhattacharjee; Yeshwanth Venkatesha; Abhishek Moitra; Priyadarshini Panda

MIME：メモリ効率の高い動的プルーニングを使用したマルチタスク推論のための単一ニューラルネットワークの適応

近年、マルチタスク学習へのパラダイムシフトが見られます。これには、マルチタスクシナリオでの推論のためのメモリとエネルギー効率の高いソリューションが必要です。 MIMEと呼ばれるアルゴリズムとハードウェアの共同設計アプローチを提案します。 MIMEは、トレーニングされた親タスクの重みパラメーターを再利用し、複数の子タスクを推論するためのタスク固有のしきい値パラメーターを学習します。 MIMEは、従来のマルチタスク推論と比較して、複数のタスクのニューラルネットワークパラメータのメモリ効率の高いDRAMストレージをもたらすことがわかりました。さらに、MIMEは入力に依存する動的なニューロンの刈り込みをもたらし、それによってシストリックアレイハードウェアでより高いスループットでエネルギー効率の高い推論を可能にします。ベンチマークデータセット（子タスク）-CIFAR10、CIFAR100、およびFashion-MNISTを使用した実験では、パイプラインタスクモードでの従来のマルチタスク推論と比較して、MIMEが最大3.48倍のメモリ効率と最大2.4〜3.1倍のエネルギー節約を達成することが示されています。

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)- CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ~3.48x memory-efficiency and ~2.4-3.1x energy-savings compared to conventional multi-task inference in Pipelined task mode.

updated: Mon Apr 11 2022 17:25:54 GMT+0000 (UTC)

published: Mon Apr 11 2022 17:25:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト