FBNetV5: Neural Architecture Search for Multiple Tasks in One Run

Bichen Wu; Chaojian Li; Hang Zhang; Xiaoliang Dai; Peizhao Zhang; Matthew Yu; Jialiang Wang; Yingyan Lin; Peter Vajda

FBNetV5：1回の実行で複数のタスクを検索するニューラルアーキテクチャ

ニューラルアーキテクチャ検索（NAS）は、正確で効率的な画像分類モデルを設計するために広く採用されています。ただし、NASを新しいコンピュータビジョンタスクに適用するには、依然として多大な労力が必要です。これは、1）以前のNAS研究では、他のタスクをほとんど無視しながら、画像分類を優先しすぎていたためです。 2）多くのNAS作業は、他のタスクに有利に転送できないタスク固有のコンポーネントの最適化に重点を置いています。 3）既存のNASメソッドは通常、「プロキシレス」になるように設計されており、新しい各タスクのトレーニングパイプラインと統合するには多大な労力が必要です。これらの課題に取り組むために、計算コストと人的労力を大幅に削減して、さまざまな視覚タスクのニューラルアーキテクチャを検索できるNASフレームワークであるFBNetV5を提案します。具体的には、1）シンプルでありながら包括的で転送可能な検索スペースを設計します。 2）ターゲットタスクのトレーニングパイプラインと解きほぐされたマルチタスク検索プロセス。 3）タスクの数にとらわれない計算コストで、複数のタスクのアーキテクチャを同時に検索するアルゴリズム。提案されたFBNetV5を、画像分類、オブジェクト検出、セマンティックセグメンテーションの3つの基本的なビジョンタスクを対象に評価します。 1回の検索でFBNetV5によって検索されたモデルは、3つのタスクすべてで以前の最先端技術を上回りました：画像分類（たとえば、FBNetV3と比較して同じFLOPで+ 1.3％ImageNet top-1精度）、セマンティックセグメンテーション（たとえば、FLOPSが3.6倍少ないSegFormerよりも+ 1.8％高いADE20K値mIoU）、およびオブジェクト検出（たとえば、YOLOXと比較してFLOPが1.2倍少ない+ 1.1％COCO値mAP）。

Neural Architecture Search (NAS) has been widely adopted to design accurate and efficient image classification models. However, applying NAS to a new computer vision task still requires a huge amount of effort. This is because 1) previous NAS research has been over-prioritized on image classification while largely ignoring other tasks; 2) many NAS works focus on optimizing task-specific components that cannot be favorably transferred to other tasks; and 3) existing NAS methods are typically designed to be "proxyless" and require significant effort to be integrated with each new task's training pipelines. To tackle these challenges, we propose FBNetV5, a NAS framework that can search for neural architectures for a variety of vision tasks with much reduced computational cost and human effort. Specifically, we design 1) a search space that is simple yet inclusive and transferable; 2) a multitask search process that is disentangled with target tasks' training pipeline; and 3) an algorithm to simultaneously search for architectures for multiple tasks with a computational cost agnostic to the number of tasks. We evaluate the proposed FBNetV5 targeting three fundamental vision tasks -- image classification, object detection, and semantic segmentation. Models searched by FBNetV5 in a single run of search have outperformed the previous stateof-the-art in all the three tasks: image classification (e.g., +1.3% ImageNet top-1 accuracy under the same FLOPs as compared to FBNetV3), semantic segmentation (e.g., +1.8% higher ADE20K val. mIoU than SegFormer with 3.6x fewer FLOPs), and object detection (e.g., +1.1% COCO val. mAP with 1.2x fewer FLOPs as compared to YOLOX).

updated: Tue Nov 30 2021 03:32:17 GMT+0000 (UTC)

published: Fri Nov 19 2021 02:07:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト