CBNetV2: A Composite Backbone Network Architecture for Object Detection

Tingting Liang; Xiaojie Chu; Yudong Liu; Yongtao Wang; Zhi Tang; Wei Chu; Jingdong Chen; Haibin Ling

CBNetV2：オブジェクト検出のための複合バックボーンネットワークアーキテクチャ

最新の最高性能のオブジェクト検出器は、バックボーンネットワークに大きく依存しており、その進歩により、より効果的なネットワーク構造を探索することで、一貫したパフォーマンスの向上がもたらされます。このホワイトペーパーでは、トレーニング前の微調整プロトコルの下で既存のオープンソースの事前トレーニング済みバックボーンをより適切にトレーニングするために、新規で柔軟なバックボーンフレームワーク、つまりCBNetV2を提案します。特に、CBNetV2アーキテクチャは、複合接続を介して接続されている複数の同一のバックボーンをグループ化します。具体的には、複数のバックボーンネットワークの高レベルと低レベルの機能を統合し、受容野を徐々に拡大して、オブジェクト検出をより効率的に実行します。また、CBNetベースの検出器のアシスタント監視によるより良いトレーニング戦略を提案します。 CBNetV2は、検出器アーキテクチャのさまざまなバックボーンおよびヘッド設計に対して強力な一般化機能を備えています。追加の事前トレーニングなしで、CBNetV2は、手動ベースおよびNASベース、ならびにCNNベースおよびTransformerベースのものを含むさまざまなバックボーンに適合させることができます。実験は、複合バックボーンがより広くてより深いネットワークよりも効率的で、効果的で、リソースに優しいことを示す強力な証拠を提供します。 CBNetV2は、1ステージおよび2ステージの検出器、アンカーベースおよびアンカーフリーベースの検出器を含むほとんどの主流の検出器のヘッド設計と互換性があり、ベースラインよりも3.0％以上APパフォーマンスが大幅に向上します。 COCOで。特に、シングルモデルおよびシングルスケールのテストプロトコルでは、Dual-Swin-LはCOCO test-devで59.4％のボックスAPと51.6％のマスクAPを達成します。これは、最先端の結果よりも大幅に優れています。（つまり、57.7％のボックスAPと50.2％のマスクAP）。コードはhttps://github.com/VDIGPKU/CBNetV2で入手できます。

Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. In this paper, we propose a novel and flexible backbone framework, namely CBNetV2, to better train existing open-sourced pre-trained backbones under the pre-training fine-tuning protocol. In particular, CBNetV2 architecture groups multiple identical backbones, which are connected through composite connections. Specifically, it integrates the high- and low-level features of multiple backbone networks and gradually expands the receptive field to more efficiently perform object detection. We also propose a better training strategy with assistant supervision for CBNet-based detectors. CBNetV2 has strong generalization capabilities for different backbones and head designs of the detector architecture. Without additional pre-training, CBNetV2 can be adapted to various backbones, including manual-based and NAS-based, as well as CNN-based and Transformer-based ones. Experiments provide strong evidence showing that composite backbones are more efficient, effective, and resource-friendly than wider and deeper networks. CBNetV2 is compatible with the head designs of most mainstream detectors, including one-stage and two-stage detectors, as well as anchor-based and anchor-free-based ones, and significantly improve their performances by more than 3.0% AP over the baseline on COCO. Particularly, under the single-model and single-scale testing protocol, our Dual-Swin-L achieves 59.4% box AP and 51.6% mask AP on COCO test-dev, which is significantly better than the state-of-the-art result (i.e., 57.7% box AP and 50.2% mask AP). Code is available at https://github.com/VDIGPKU/CBNetV2.

updated: Mon Jul 12 2021 09:12:05 GMT+0000 (UTC)

published: Thu Jul 01 2021 13:05:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト