Once-for-All: Train One Network and Specialize it for Efficient Deployment

Han Cai; Chuang Gan; Tianzhe Wang; Zhekai Zhang; Song Han

1回限り：1つのネットワークをトレーニングして、効率的な導入に特化する

特にエッジデバイスでは、多くのデバイスにわたる効率的な推論とリソースの制約という難しい問題に対処します。従来のアプローチでは、手動で設計するか、ニューラルアーキテクチャ検索（NAS）を使用して特殊なニューラルネットワークを見つけ、ケースごとにゼロからトレーニングします。これは、計算上非常に困難（CO2排出が5台の車の寿命になる）であり、拡張性がありません。この作業では、コストを削減するために、トレーニングと検索を分離することにより、多様なアーキテクチャ設定をサポートする万能型（OFA）ネットワークをトレーニングすることを提案します。追加のトレーニングなしでOFAネットワークから選択することで、特殊なサブネットワークをすばやく取得できます。 OFAネットワークを効率的にトレーニングするために、新しいプログレッシブ縮小アルゴリズムを提案します。これは、剪定よりもはるかに多くの次元（深さ、幅、カーネルサイズ、解像度）でモデルサイズを削減する一般化された剪定方法です。個別のトレーニングと同じレベルの精度を維持しながら、さまざまなハードウェアプラットフォームとレイテンシの制約に対応できる驚くほど多数のサブネットワーク（> 10 ^ 19）を取得できます。多様なエッジデバイスで、OFAは常に最先端の（SOTA）NASメソッドよりも優れています（最大4.0％のImageNet top1の精度はMobileNetV3よりも向上していますが、同じ精度ですが、MobileNetV3より1.5倍速く、EfficientNetの測定されたレイテンシより2.6倍高速です。）桁違いのGPU時間とCO_2排出量を削減しながら。特に、OFAはモバイル設定（<600M MAC）で新しいSOTA 80.0％ImageNetトップ1精度を達成します。 OFAは、第3低電力コンピュータービジョンチャレンジ（LPCVC）、DSP分類トラック、および第4 LPCVC（分類トラックと検出トラックの両方）の勝利ソリューションです。コードと50の事前トレーニング済みモデル（多くのデバイスと多くのレイテンシ制約用）がhttps://github.com/mit-han-lab/once-for-allでリリースされています。

We address the challenging problem of efficient inference across many devices and resource constraints, especially on edge devices. Conventional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally prohibitive (causing CO_2 emission as much as 5 cars' lifetime) thus unscalable. In this work, we propose to train a once-for-all (OFA) network that supports diverse architectural settings by decoupling training and search, to reduce the cost. We can quickly get a specialized sub-network by selecting from the OFA network without additional training. To efficiently train OFA networks, we also propose a novel progressive shrinking algorithm, a generalized pruning method that reduces the model size across many more dimensions than pruning (depth, width, kernel size, and resolution). It can obtain a surprisingly large number of sub-networks (> 10^19) that can fit different hardware platforms and latency constraints while maintaining the same level of accuracy as training independently. On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4.0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1.5x faster than MobileNetV3, 2.6x faster than EfficientNet w.r.t measured latency) while reducing many orders of magnitude GPU hours and CO_2 emission. In particular, OFA achieves a new SOTA 80.0% ImageNet top-1 accuracy under the mobile setting (<600M MACs). OFA is the winning solution for the 3rd Low Power Computer Vision Challenge (LPCVC), DSP classification track and the 4th LPCVC, both classification track and detection track. Code and 50 pre-trained models (for many devices & many latency constraints) are released at https://github.com/mit-han-lab/once-for-all.

updated: Wed Apr 29 2020 20:49:05 GMT+0000 (UTC)

published: Mon Aug 26 2019 16:46:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト