Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

Houwen Peng; Hao Du; Hongyuan Yu; Qi Li; Jing Liao; Jianlong Fu

作物のクリーム：ワンショットニューラルアーキテクチャ検索のための優先パスの抽出

ワンショットウェイトシェアリング法は、高効率と競争力のあるパフォーマンスにより、ニューラルアーキテクチャ検索で最近大きな注目を集めています。ただし、モデル間での重みの共有には固有の欠陥があります。つまり、ハイパーネットワークでのサブネットワークのトレーニングが不十分です。この問題を軽減するために、シンプルでありながら効果的なアーキテクチャ蒸留法を紹介します。中心的な考え方は、サブネットワークがトレーニングプロセス全体を通じて共同で学習し、互いに教え合うことができ、個々のモデルの収束を促進することを目的としています。優先パスの概念を紹介します。これは、トレーニング中に優れたパフォーマンスを示すアーキテクチャ候補を指します。優先パスから知識を抽出することで、サブネットワークのトレーニングを強化できます。優先パスはパフォーマンスと複雑さに応じてその場で変更されるため、最終的に得られるパスは作物のクリームです。強化学習や進化的アルゴリズムなどの他の複雑な検索方法を使用せずに、優先順位付けされたパスから最も有望なものを最終的なアーキテクチャとして直接選択します。 ImageNetでの実験により、このようなパス蒸留法がハイパーネットワークの収束率とパフォーマンスを向上させ、サブネットワークのトレーニングを強化できることが確認されました。発見されたアーキテクチャは、調整された設定の下で、最近のMobileNetV3およびEfficientNetファミリと比較して優れたパフォーマンスを実現します。さらに、物体検出とより挑戦的な探索空間に関する実験は、提案された方法の一般性とロバスト性を示しています。コードとモデルはhttps://github.com/microsoft/cream.gitで入手できます。

One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance. However, weight sharing across models has an inherent deficiency, i.e., insufficient training of subnetworks in hypernetworks. To alleviate this problem, we present a simple yet effective architecture distillation method. The central idea is that subnetworks can learn collaboratively and teach each other throughout the training process, aiming to boost the convergence of individual models. We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training. Distilling knowledge from the prioritized paths is able to boost the training of subnetworks. Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop. We directly select the most promising one from the prioritized paths as the final architecture, without using other complex search methods, such as reinforcement learning or evolution algorithms. The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks. The discovered architectures achieve superior performance compared to the recent MobileNetV3 and EfficientNet families under aligned settings. Moreover, the experiments on object detection and more challenging search space show the generality and robustness of the proposed method. Code and models are available at https://github.com/microsoft/cream.git.

updated: Mon Apr 12 2021 06:30:36 GMT+0000 (UTC)

published: Thu Oct 29 2020 17:55:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト