Boosting Long-tailed Object Detection via Step-wise Learning on Smooth-tail Data

Na Dong; Yongqiang Zhang; Mingli Ding; Gim Hee Lee

スムーズテールデータの段階的学習による長い尾の物体の検出の強化

実際のデータはロングテール分布に従う傾向があり、クラスの不均衡によりトレーニング中にヘッドクラスが優勢になります。この論文では、すべてのカテゴリのロングテールデータセットを検出するモデルの能力を段階的に強化する、イライラするほどシンプルだが効果的な段階的学習フレームワークを提案します。具体的には、カテゴリのロングテール分布が滑らかに減衰するスムーズテールデータを構築して、ヘッドクラスへの偏りを修正します。すべてのカテゴリ間の識別性を維持するために、ロングテールデータ全体でモデルを事前トレーニングします。次に、ヘッドクラスの支配的なリプレイデータに基づいて事前トレーニングされたモデルのクラスに依存しないモジュールを微調整して、すべてのカテゴリからの決定境界が改善されたヘッドクラスのエキスパートモデルを取得します。最後に、すべてのカテゴリの正確な検出を確実にするために、ヘッドクラスのエキスパートモデルから知識を転送しながら、テールクラスの支配的なリプレイデータで統合モデルをトレーニングします。ロングテールデータセット LVIS v0.5 および LVIS v1.0 での広範な実験により、この手法の優れたパフォーマンスが実証され、ResNet-50 バックボーンを使用した AP を 27.0% から 30.3% AP に改善でき、特に、 AP 15.5% ～ 24.9%。 ResNet-101 バックボーンを使用する当社の最良のモデルは 30.7% の AP を達成でき、同じバックボーンを使用する既存の検出器をすべて抑制します。

Real-world data tends to follow a long-tailed distribution, where the class imbalance results in dominance of the head classes during training. In this paper, we propose a frustratingly simple but effective step-wise learning framework to gradually enhance the capability of the model in detecting all categories of long-tailed datasets. Specifically, we build smooth-tail data where the long-tailed distribution of categories decays smoothly to correct the bias towards head classes. We pre-train a model on the whole long-tailed data to preserve discriminability between all categories. We then fine-tune the class-agnostic modules of the pre-trained model on the head class dominant replay data to get a head class expert model with improved decision boundaries from all categories. Finally, we train a unified model on the tail class dominant replay data while transferring knowledge from the head class expert model to ensure accurate detection of all categories. Extensive experiments on long-tailed datasets LVIS v0.5 and LVIS v1.0 demonstrate the superior performance of our method, where we can improve the AP with ResNet-50 backbone from 27.0% to 30.3% AP, and especially for the rare categories from 15.5% to 24.9% AP. Our best model using ResNet-101 backbone can achieve 30.7% AP, which suppresses all existing detectors using the same backbone.

updated: Mon May 22 2023 08:53:50 GMT+0000 (UTC)

published: Mon May 22 2023 08:53:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト