Revisiting ResNets: Improved Training and Scaling Strategies

Irwan Bello; William Fedus; Xianzhi Du; Ekin D. Cubuk; Aravind Srinivas; Tsung-Yi Lin; Jonathon Shlens; Barret Zoph

ResNetsの再検討: トレーニングとスケーリング戦略の改善

新しいコンピュータビジョンアーキテクチャはスポットライトを独占しているが、モデルアーキテクチャの影響は、トレーニング方法とスケーリング戦略の同時変更と混同されることがよくある。我々の研究では、定型的なResNet(He et al., 2015)を再検討し、これら3つの側面を分離することを検討する。意外なことに、学習方法とスケーリング戦略は、アーキテクチャの変更よりも重要であることがわかった。最適なスケーリング戦略が学習レジームに依存することを示し、2つの新しいスケーリング戦略を提案する。(1) オーバーフィッティングが発生する可能性がある領域ではモデルの深さをスケーリングする(そうでない場合は幅のスケーリングが望ましい)。(2) 画像の解像度をこれまで推奨されていた(Tan & Le, 2019)よりもゆっくりと上げる。改良された学習戦略とスケーリング戦略を用いて、TPU上でEfficientNetsよりも1.7倍～2.7倍高速で、ImageNetで同等の精度を達成するResNetアーキテクチャファミリ、ResNet-RSを設計した。大規模な半教師付き学習の設定では、ResNet-RSはEfficientNet NoisyStudentよりも4.7倍高速でありながら、86.2%のトップ1 ImageNet精度を達成する。この学習手法は、一連のダウンストリームタスクにおける転移性能を向上させ(最先端の自己教師付きアルゴリズムに匹敵する)、Kinetics-400のビデオ分類にも拡張される。今後の研究のために、これらのシンプルな改訂版ResNetsをベースラインとして使用することを推奨する。

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.

updated: Sat Mar 13 2021 00:18:19 GMT+0000 (UTC)

published: Sat Mar 13 2021 00:18:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト