Rethinking Training from Scratch for Object Detection

Yang Li; Hong Zhang; Yu Zhang

物体検出のためのトレーニングをゼロから再考する

ImageNet の事前トレーニングの初期化は、オブジェクト検出の事実上の標準です。彼等。適切な正規化手法を使用してより長いトレーニングスケジュールを必要とする一方で、検出器をゼロからトレーニング (ランダムな初期化) できることがわかりました。このホワイトペーパーでは、オブジェクト検出のためのターゲットデータセットで直接事前トレーニングすることを検討します。このような状況では、広く採用されている大きなサイズ変更戦略、たとえば画像のサイズを (1333, 800) に変更することは、微調整には重要ですが、事前トレーニングには必要ないことがわかりました。具体的には、「事前トレーニングと微調整」に続くオブジェクト検出のための新しいトレーニングパイプラインを提案し、ターゲットデータセット内の低解像度画像を事前トレーニング検出器に利用してから、高解像度画像で微調整するためにロードします。この戦略では、事前トレーニング中に大きなバスサイズでバッチ正規化 (BN) を使用できます。また、非常に限られた GPU メモリ (11G) を備えたマシンに適用できるため、メモリ効率も高くなります。これを直接検出事前トレーニングと呼び、略して直接事前トレーニングも使用します。実験結果は、直接事前トレーニングが、ImageNet 事前トレーニングと比較して +1.8mAP である一方で、COCO データセットで事前トレーニングフェーズを 11 倍以上加速することを示しています。さらに、直接事前トレーニングが Swin Transformer などの変圧器ベースのバックボーンにも適用できることがわかりました。コードが利用可能になります。

The ImageNet pre-training initialization is the de-facto standard for object detection. He et al. found it is possible to train detector from scratch(random initialization) while needing a longer training schedule with proper normalization technique. In this paper, we explore to directly pre-training on target dataset for object detection. Under this situation, we discover that the widely adopted large resizing strategy e.g. resize image to (1333, 800) is important for fine-tuning but it's not necessary for pre-training. Specifically, we propose a new training pipeline for object detection that follows `pre-training and fine-tuning', utilizing low resolution images within target dataset to pre-training detector then load it to fine-tuning with high resolution images. With this strategy, we can use batch normalization(BN) with large bath size during pre-training, it's also memory efficient that we can apply it on machine with very limited GPU memory(11G). We call it direct detection pre-training, and also use direct pre-training for short. Experiment results show that direct pre-training accelerates the pre-training phase by more than 11x on COCO dataset while with even +1.8mAP compared to ImageNet pre-training. Besides, we found direct pre-training is also applicable to transformer based backbones e.g. Swin Transformer. Code will be available.

updated: Sun Jun 06 2021 13:05:57 GMT+0000 (UTC)

published: Sun Jun 06 2021 13:05:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト