Faster Attention Is What You Need: A Fast Self-Attention Neural Network Backbone Architecture for the Edge via Double-Condensing Attention Condensers

Alexander Wong; Mohammad Javad Shafiee; Saad Abbasi; Saeejith Nair; Mahmoud Famouri

より速い注意が必要です: 二重凝縮アテンションコンデンサーによるエッジ用の高速自己注意ニューラルネットワークバックボーンアーキテクチャ

オンデバイスの TinyML アプリケーションでのディープラーニングの採用が進むにつれて、エッジ用に最適化されたより効率的なニューラルネットワークバックボーンに対する需要がますます高まっています。最近、アテンションコンデンサーネットワークの導入により、精度と速度のバランスが取れた、フットプリントが小さく、高効率のセルフアテンションニューラルネットワークが実現しました。この研究では、より凝縮された機能の埋め込みを可能にする二重凝縮アテンションコンデンサーと呼ばれる新しい高速アテンションコンデンサー設計を紹介します。さらに、バックボーンのマクロミクロアーキテクチャ構造を生成するための効率性と堅牢性を高めるためのベストプラクティスの設計制約を課す、機械主導の設計探索戦略を採用しています。結果として得られるバックボーン (AttendNeXt と名付けます) は、他のいくつかの最先端の効率的なバックボーンと比較して、組み込み ARM プロセッサで大幅に高い推論スループットを達成します (より高い精度と速度で FB-Net C よりも 10 倍以上速く、10 倍以上高速です)。小さいサイズで MobileOne-S1 よりも高速) ながら、小さいモデルサイズ (より高い精度と速度で MobileNetv3-L よりも 1.37 倍小さい) と高い精度 (高速で ImageNet 上の MobileViT XS よりも 1.1% 高いトップ 1 精度) を備えています。 .これらの有望な結果は、さまざまな効率的なアーキテクチャ設計と自己注意メカニズムの調査が、TinyML アプリケーションの興味深い新しい構成要素につながる可能性があることを示しています。

With the growing adoption of deep learning for on-device TinyML applications, there has been an ever-increasing demand for more efficient neural network backbones optimized for the edge. Recently, the introduction of attention condenser networks have resulted in low-footprint, highly-efficient, self-attention neural networks that strike a strong balance between accuracy and speed. In this study, we introduce a new faster attention condenser design called double-condensing attention condensers that enable more condensed feature embedding. We further employ a machine-driven design exploration strategy that imposes best practices design constraints for greater efficiency and robustness to produce the macro-micro architecture constructs of the backbone. The resulting backbone (which we name AttendNeXt) achieves significantly higher inference throughput on an embedded ARM processor when compared to several other state-of-the-art efficient backbones (>10X faster than FB-Net C at higher accuracy and speed and >10X faster than MobileOne-S1 at smaller size) while having a small model size (>1.37X smaller than MobileNetv3-L at higher accuracy and speed) and strong accuracy (1.1% higher top-1 accuracy than MobileViT XS on ImageNet at higher speed). These promising results demonstrate that exploring different efficient architecture designs and self-attention mechanisms can lead to interesting new building blocks for TinyML applications.

updated: Mon Aug 22 2022 03:13:01 GMT+0000 (UTC)

published: Mon Aug 15 2022 02:47:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト