PV-NAS: Practical Neural Architecture Search for Video Recognition

Zihao Wang; Chen Lin; Lu Sheng; Junjie Yan; Jing Shao

PV-NAS：ビデオ認識のための実用的なニューラルアーキテクチャ検索

最近、深層学習は、その卓越した表現能力により、ビデオ認識の問題を解決するために利用されています。ビデオタスク用のディープニューラルネットワークは高度にカスタマイズされており、そのようなネットワークの設計には、ドメインの専門家と費用のかかる試行錯誤のテストが必要です。ネットワークアーキテクチャ検索の最近の進歩により、画像認識のパフォーマンスが大幅に向上しました。ただし、ビデオ認識ネットワークの自動設計はあまり検討されていません。本研究では、実用的なソリューション、すなわち実用的なビデオニューラルアーキテクチャ検索（PV-NAS）を提案します。PV-NASは、勾配ベースの検索方法を使用して、新しい時空間ネットワーク検索空間で非常に大規模なアーキテクチャを効率的に検索できます。。最適ではないソリューションに固執することを避けるために、検索されたモデルの十分なネットワーク多様性を促進するための新しい学習率スケジューラを提案します。広範な経験的評価は、提案されたPV-NASがはるかに少ない計算リソースで最先端のパフォーマンスを達成することを示しています。 1）軽量モデル内で、当社のPV-NAS-Lは、Kinetics-400およびSomething-Something V2で78.7％および62.5％のTop-1精度を達成します。これは、以前の最先端の方法（つまり、 TSM）大きなマージン（各データセットでそれぞれ4.6％と3.4％）、および2）中央値の重みモデルの中で、PV-NAS-MはSomething-Something V2で最高のパフォーマンス（これも新記録）を達成しますデータセット。

Recently, deep learning has been utilized to solve video recognition problem due to its prominent representation ability. Deep neural networks for video tasks is highly customized and the design of such networks requires domain experts and costly trial and error tests. Recent advance in network architecture search has boosted the image recognition performance in a large margin. However, automatic designing of video recognition network is less explored. In this study, we propose a practical solution, namely Practical Video Neural Architecture Search (PV-NAS).Our PV-NAS can efficiently search across tremendous large scale of architectures in a novel spatial-temporal network search space using the gradient based search methods. To avoid sticking into sub-optimal solutions, we propose a novel learning rate scheduler to encourage sufficient network diversity of the searched models. Extensive empirical evaluations show that the proposed PV-NAS achieves state-of-the-art performance with much fewer computational resources. 1) Within light-weight models, our PV-NAS-L achieves 78.7% and 62.5% Top-1 accuracy on Kinetics-400 and Something-Something V2, which are better than previous state-of-the-art methods (i.e., TSM) with a large margin (4.6% and 3.4% on each dataset, respectively), and 2) among median-weight models, our PV-NAS-M achieves the best performance (also a new record)in the Something-Something V2 dataset.

updated: Tue Nov 03 2020 02:33:49 GMT+0000 (UTC)

published: Mon Nov 02 2020 08:50:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト