Generalized Latency Performance Estimation for Once-For-All Neural Architecture Search

Muhtadyuzzaman Syed; Arvind Akpuram Srinivasan

一度限りのニューラルアーキテクチャ検索のための一般化された遅延パフォーマンスの推定

ニューラルアーキテクチャ検索（NAS）は、検索スペース、検索戦略、およびパフォーマンス推定戦略を定義するディープニューラルネットワークアーキテクチャの手動開発を合理化することにより、自動機械学習の可能性を可能にしました。畳み込みニューラルネットワーク（CNN）モデルのマルチプラットフォーム展開の必要性を解決するために、Once-For-All（OFA）は、トレーニングと検索を分離して、さまざまな精度に制約されたサブネットワークのワンショットモデルを提供することを提案しました-レイテンシのトレードオフ。 OFAの検索のパフォーマンス見積もり戦略は、事前に構築するのにかなりの時間と手作業を必要とする単一のハードウェア遅延ルックアップテーブルのため、さまざまなハードウェア展開プラットフォームの一般化可能性を大幅に欠いていることがわかりました。この作業では、異種ハードウェアサポートの必要性に対処し、ルックアップテーブルのオーバーヘッドを完全に削減するために、ニューラルネットワークアーキテクチャの遅延予測子を構築するためのフレームワークを示します。特定のハードウェアとNAS検索スペースでトレーニングされたベースモデルを使用した微調整と、コア数、RAMサイズ、メモリ帯域幅などのGPUハードウェアパラメーターでモデルをトレーニングするGPU一般化を含む2つの一般化戦略を紹介します。これにより、ProxylessNASと比較して50％以上低いRMSE損失を達成するレイテンシ予測モデルのファミリーを提供します。また、これらの遅延予測子の使用が、特定の場合にそれを超えない場合、ルックアップテーブルベースラインアプローチのNASパフォーマンスと一致することも示します。

Neural Architecture Search (NAS) has enabled the possibility of automated machine learning by streamlining the manual development of deep neural network architectures defining a search space, search strategy, and performance estimation strategy. To solve the need for multi-platform deployment of Convolutional Neural Network (CNN) models, Once-For-All (OFA) proposed to decouple Training and Search to deliver a one-shot model of sub-networks that are constrained to various accuracy-latency tradeoffs. We find that the performance estimation strategy for OFA's search severely lacks generalizability of different hardware deployment platforms due to single hardware latency lookup tables that require significant amount of time and manual effort to build beforehand. In this work, we demonstrate the framework for building latency predictors for neural network architectures to address the need for heterogeneous hardware support and reduce the overhead of lookup tables altogether. We introduce two generalizability strategies which include fine-tuning using a base model trained on a specific hardware and NAS search space, and GPU-generalization which trains a model on GPU hardware parameters such as Number of Cores, RAM Size, and Memory Bandwidth. With this, we provide a family of latency prediction models that achieve over 50% lower RMSE loss as compared to with ProxylessNAS. We also show that the use of these latency predictors match the NAS performance of the lookup table baseline approach if not exceeding it in certain cases.

updated: Mon Jan 04 2021 00:48:09 GMT+0000 (UTC)

published: Mon Jan 04 2021 00:48:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト