RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection

Zhuofan Zong; Qianggang Cao; Biao Leng

RCNet：オブジェクト検出のためのリバースフィーチャーピラミッドとクロススケールシフトネットワーク

機能ピラミッドネットワーク（FPN）は、既存の高度なオブジェクト検出フレームワークでのマルチスケール機能融合に広く利用されています。これまでの多くの研究で、双方向の特徴融合のためのさまざまな構造が開発されており、そのすべてが検出性能を効果的に改善することが示されています。これらの複雑なネットワーク構造では、フィーチャピラミッドを一定の順序でスタックする必要があり、パイプラインが長くなり、推論速度が低下することがわかります。さらに、隣接するピラミッドレベルのフィーチャのみがローカルフュージョン操作によってシーケンス方式でマージされるため、隣接していないレベルのセマンティクスはフィーチャピラミッドで希釈されます。これらの問題に対処するために、リバースフィーチャーピラミッド（RevFP）とクロススケールシフトネットワーク（CSN）で構成されるRCNetという名前の新しいアーキテクチャを提案します。 RevFPは、ローカルの双方向機能融合を利用して、双方向ピラミッド推論パイプラインを簡素化します。 CSNは、表現を隣接レベルと非隣接レベルの両方に直接伝播して、マルチスケール機能の相関性を高めます。 MS COCOデータセットでの広範な実験は、RCNetが、わずかな追加の計算オーバーヘッドで、1ステージおよび2ステージの両方の検出器に対して一貫して大幅な改善をもたらすことができることを示しています。特に、RetinaNetは、FPNを提案されたモデルに置き換えることにより、ベースラインより3.7ポイント高い40.2APにブーストされます。 COCO test-devでは、RCNetは単一モデルの単一スケール50.5APで非常に競争力のあるパフォーマンスを達成できます。コードが利用可能になります。

Feature pyramid networks (FPN) are widely exploited for multi-scale feature fusion in existing advanced object detection frameworks. Numerous previous works have developed various structures for bidirectional feature fusion, all of which are shown to improve the detection performance effectively. We observe that these complicated network structures require feature pyramids to be stacked in a fixed order, which introduces longer pipelines and reduces the inference speed. Moreover, semantics from non-adjacent levels are diluted in the feature pyramid since only features at adjacent pyramid levels are merged by the local fusion operation in a sequence manner. To address these issues, we propose a novel architecture named RCNet, which consists of Reverse Feature Pyramid (RevFP) and Cross-scale Shift Network (CSN). RevFP utilizes local bidirectional feature fusion to simplify the bidirectional pyramid inference pipeline. CSN directly propagates representations to both adjacent and non-adjacent levels to enable multi-scale features more correlative. Extensive experiments on the MS COCO dataset demonstrate RCNet can consistently bring significant improvements over both one-stage and two-stage detectors with subtle extra computational overhead. In particular, RetinaNet is boosted to 40.2 AP, which is 3.7 points higher than baseline, by replacing FPN with our proposed model. On COCO test-dev, RCNet can achieve very competitive performance with a single-model single-scale 50.5 AP. Codes will be made available.

updated: Sat Oct 23 2021 04:00:25 GMT+0000 (UTC)

published: Sat Oct 23 2021 04:00:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト