Restructurable Activation Networks

Kartikeya Bhardwaj; James Ward; Caleb Tung; Dibakar Gope; Lingchuan Meng; Igor Fedorov; Alex Chalfin; Paul Whatmough; Danny Loh

再構築可能な活性化ネットワーク

ハードウェア効率の高いモデルを作成するために、深いネットワークで非線形活性化関数を再構築することは可能ですか?この問題に対処するために、モデルの非線形性の量を操作してハードウェア認識と効率を向上させる、再構築可能なアクティベーションネットワーク (RAN) と呼ばれる新しいパラダイムを提案します。まず、非効率的なブロックをハードウェア認識ブロックに置き換えるために、新しいハードウェア認識検索空間と半自動検索アルゴリズムである RAN-explicit (RAN-e) を提案します。次に、RAN-implicit (RAN-i) と呼ばれるトレーニング不要のモデルスケーリング方法を提案します。この方法では、ネットワークトポロジとその表現力の間のリンクを非線形ユニットの数で理論的に証明します。当社のネットワークが ImageNet でさまざまな規模と数種類のハードウェアで最先端の結果を達成することを実証します。たとえば、EfficientNet-Lite-B0 と比較すると、RAN-e は同様の精度を実現しながら、Arm マイクロ NPU で 1 秒あたりのフレーム数 (FPS) を 1.5 倍向上させます。一方、RAN-i は、ConvNexts よりも #MAC を最大 2 倍削減し、同等またはそれ以上の精度を示しています。また、Arm ベースのデータセンター CPU では、RAN-i が ConvNext よりも 40% 近く高い FPS を達成することも示しています。最後に、RAN-i ベースのオブジェクト検出ネットワークは、ConvNext ベースのモデルと比較して、データセンターの CPU で同等またはそれ以上の mAP と最大 33% 高い FPS を達成します。 RAN と事前にトレーニングされたネットワークをトレーニングおよび評価するためのコードは、https://github.com/ARM-software/ML-restructurable-activation-networks で入手できます。

Is it possible to restructure the non-linear activation functions in a deep network to create hardware-efficient models? To address this question, we propose a new paradigm called Restructurable Activation Networks (RANs) that manipulate the amount of non-linearity in models to improve their hardware-awareness and efficiency. First, we propose RAN-explicit (RAN-e) -- a new hardware-aware search space and a semi-automatic search algorithm -- to replace inefficient blocks with hardware-aware blocks. Next, we propose a training-free model scaling method called RAN-implicit (RAN-i) where we theoretically prove the link between network topology and its expressivity in terms of number of non-linear units. We demonstrate that our networks achieve state-of-the-art results on ImageNet at different scales and for several types of hardware. For example, compared to EfficientNet-Lite-B0, RAN-e achieves a similar accuracy while improving Frames-Per-Second (FPS) by 1.5x on Arm micro-NPUs. On the other hand, RAN-i demonstrates up to 2x reduction in #MACs over ConvNexts with a similar or better accuracy. We also show that RAN-i achieves nearly 40% higher FPS than ConvNext on Arm-based datacenter CPUs. Finally, RAN-i based object detection networks achieve a similar or higher mAP and up to 33% higher FPS on datacenter CPUs compared to ConvNext based models. The code to train and evaluate RANs and the pretrained networks are available at https://github.com/ARM-software/ML-restructurable-activation-networks.

updated: Wed Sep 07 2022 19:42:25 GMT+0000 (UTC)

published: Wed Aug 17 2022 22:43:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト