End-to-End Object Detection with Fully Convolutional Network

Jianfeng Wang; Lin Song; Zeming Li; Hongbin Sun; Jian Sun; Nanning Zheng

完全畳み込みネットワークによるエンドツーエンドのオブジェクト検出

完全畳み込みネットワークに基づく主流の物体検出器は、印象的なパフォーマンスを達成しました。それらのほとんどは、手作業で設計された非最大抑制（NMS）後処理を必要としますが、これは完全なエンドツーエンドのトレーニングを妨げます。このホワイトペーパーでは、NMSの破棄の分析を行います。この結果から、適切なラベルの割り当てが重要な役割を果たしていることがわかります。この目的のために、完全畳み込み検出器の場合、分類用の予測対応1対1（POTO）ラベル割り当てを導入して、エンドツーエンド検出を可能にします。これにより、NMSと同等のパフォーマンスが得られます。さらに、マルチスケール機能を利用し、ローカル領域での畳み込みの識別可能性を向上させるために、単純な3D Maxフィルタリング（3DMF）が提案されています。これらの手法により、当社のエンドツーエンドフレームワークは、COCOおよびCrowdHumanデータセット上のNMSを使用して、多くの最先端の検出器に対して競争力のあるパフォーマンスを実現します。コードはhttps://github.com/Megvii-BaseDetection/DeFCNで入手できます。

Mainstream object detectors based on the fully convolutional network has achieved impressive performance. While most of them still need a hand-designed non-maximum suppression (NMS) post-processing, which impedes fully end-to-end training. In this paper, we give the analysis of discarding NMS, where the results reveal that a proper label assignment plays a crucial role. To this end, for fully convolutional detectors, we introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection, which obtains comparable performance with NMS. Besides, a simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region. With these techniques, our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets. The code is available at https://github.com/Megvii-BaseDetection/DeFCN .

updated: Thu Mar 25 2021 04:18:14 GMT+0000 (UTC)

published: Mon Dec 07 2020 09:14:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト