A Survey of Techniques for Optimizing Transformer Inference

Krishna Teja Chitty-Venkata; Sparsh Mittal; Murali Emani; Venkatram Vishwanath; Arun K. Somani

トランスフォーマー推論を最適化する手法の概要

近年、トランスフォーマーニューラルネットワークのパフォーマンスとアプリケーションが驚異的に向上しています。 Bidirectional Encoder Representations from Transformer (BERT)、Generative Pretrained Transformer (GPT)、Vision Transformer (ViT) などのトランスフォーマーネットワークファミリは、自然言語処理 (NLP) およびコンピュータービジョン (CV) ドメイン全体でその有効性を示しています。 ChatGPT などのトランスフォーマーベースのネットワークは、一般人の生活に影響を与えています。しかし、高い予測パフォーマンスの追求により、トランスフォーマーのメモリとコンピューティングフットプリントは飛躍的に増加しました。研究者は、あらゆる抽象レベルでトランスフォーマー推論を最適化する手法を提案しています。このペーパーでは、変圧器ネットワークの推論フェーズを最適化するための技術の包括的な調査について説明します。知識の蒸留、枝刈り、量子化、ニューラルアーキテクチャの検索、軽量ネットワーク設計などの手法をアルゴリズムレベルで調査します。さらに、ハードウェアレベルの最適化技術と変圧器用の新しいハードウェアアクセラレータの設計をレビューします。パラメーター/FLOP の数といくつかのモデル/テクニックの精度に関する定量的結果を要約し、それらによって実行されるトレードオフを示します。また、この急速に進化する研究分野における将来の方向性についても概説します。私たちは、この調査が初心者と熟練の研究者の両方を教育し、この分野での多くの研究活動のきっかけとなると信じています。

Recent years have seen a phenomenal rise in performance and applications of transformer neural networks. The family of transformer networks, including Bidirectional Encoder Representations from Transformer (BERT), Generative Pretrained Transformer (GPT) and Vision Transformer (ViT), have shown their effectiveness across Natural Language Processing (NLP) and Computer Vision (CV) domains. Transformer-based networks such as ChatGPT have impacted the lives of common men. However, the quest for high predictive performance has led to an exponential increase in transformers' memory and compute footprint. Researchers have proposed techniques to optimize transformer inference at all levels of abstraction. This paper presents a comprehensive survey of techniques for optimizing the inference phase of transformer networks. We survey techniques such as knowledge distillation, pruning, quantization, neural architecture search and lightweight network design at the algorithmic level. We further review hardware-level optimization techniques and the design of novel hardware accelerators for transformers. We summarize the quantitative results on the number of parameters/FLOPs and accuracy of several models/techniques to showcase the tradeoff exercised by them. We also outline future directions in this rapidly evolving field of research. We believe that this survey will educate both novice and seasoned researchers and also spark a plethora of research efforts in this field.

updated: Sun Jul 16 2023 08:50:50 GMT+0000 (UTC)

published: Sun Jul 16 2023 08:50:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト