Content-Augmented Feature Pyramid Network with Light Linear Spatial Transformers for Object Detection

Yongxiang Gu; Xiaolin Qin; Yuncong Peng; Lu Li

物体検出のための光線形空間変換器を備えたコンテンツ拡張機能ピラミッドネットワーク

普及しているコンポーネントの1つとして、機能ピラミッドネットワーク（FPN）は、マルチスケール検出のパフォーマンスを向上させるために、現在のオブジェクト検出モデルで広く使用されています。ただし、その相互作用は依然として局所的で損失の多い方法であるため、表現力が制限されます。この論文では、物体検出における人間の視覚のグローバルビューをシミュレートし、FPNのインタラクションモードに固有の欠陥に対処するために、コンテンツ拡張機能ピラミッドネットワーク（CA-FPN）と呼ばれる新しいアーキテクチャを構築します。ローカル受容野内の機能を融合するバニラFPNとは異なり、CA-FPNはグローバルビューから同様の機能を適応的に集約できます。グローバルコンテンツ抽出モジュールとライトリニア空間トランスフォーマーを搭載しています。前者はマルチスケールのコンテキスト情報を抽出でき、後者はモデルの複雑さを軽減するように設計された線形化されたアテンション関数を使用して、グローバルコンテンツ抽出モジュールをバニラFPNと深く組み合わせることができます。さらに、CA-FPNは既存のFPNベースのモデルに簡単に接続できます。挑戦的なCOCOおよびPASCALVOCオブジェクト検出データセットに関する広範な実験により、当社のCA-FPNは、ベルやホイッスルのない競合するFPNベースの検出器よりも大幅に優れていることが実証されました。 CA-FPNを標準のResNet-50バックボーン上に構築されたCascadeR-CNNフレームワークに接続すると、私たちの方法はCOCOmini-valで44.8APを達成できます。そのパフォーマンスは、以前の最先端技術を1.5 AP超えており、アプリケーションの可能性を示しています。

As one of the prevalent components, Feature Pyramid Network (FPN) is widely used in the current object detection models to improve the performance of multi-scale detection. However, its interaction is still in a local and lossy manner, thus limiting the representation power. In this paper, to simulate a global view of human vision in object detection and address the inherent defects of interaction mode in FPN, we construct a novel architecture termed Content-Augmented Feature Pyramid Network (CA-FPN). Unlike the vanilla FPN, which fuses features within a local receptive field, CA-FPN can adaptively aggregate similar features from a global view. It is equipped with a global content extraction module and light linear spatial transformers. The former allows to extract multi-scale context information and the latter can deeply combine the global content extraction module with the vanilla FPN using the linearized attention function, which is designed to reduce model complexity. Furthermore, CA-FPN can be readily plugged into existing FPN-based models. Extensive experiments on the challenging COCO and PASCAL VOC object detection datasets demonstrated that our CA-FPN significantly outperforms competitive FPN-based detectors without bells and whistles. When plugging CA-FPN into Cascade R-CNN framework built upon a standard ResNet-50 backbone, our method can achieve 44.8 AP on COCO mini-val. Its performance surpasses the previous state-of-the-art by 1.5 AP, demonstrating the potentiality of application.

updated: Sat Jul 17 2021 09:12:24 GMT+0000 (UTC)

published: Thu May 20 2021 02:31:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト