Multi-patch Feature Pyramid Network for Weakly Supervised Object Detection in Optical Remote Sensing Images

Pourya Shamsolmoali; Jocelyn Chanussot; Masoumeh Zareapoor; Huiyu Zhou; Jie Yang

光学リモートセンシング画像における弱く監視された物体検出のためのマルチパッチ機能ピラミッドネットワーク

オブジェクトは画像内の数ピクセルしか占有せず、モデルはオブジェクトの位置と検出を同時に学習する必要があるため、オブジェクトの検出はリモートセンシングでは困難な作業です。確立されたアプローチは、通常のサイズのオブジェクトに対しては十分に機能しますが、小さいオブジェクトを分析したり、極小値（たとえば、誤ったオブジェクトパーツ）でスタックしたりすると、パフォーマンスが低下します。 2つの考えられる問題が邪魔になっています。第一に、既存の方法は、背景が複雑であるため、小さな物体の検出を安定して実行するのに苦労している。第二に、ほとんどの標準的な方法は手作りの機能を使用しており、一部が欠落しているオブジェクトの検出ではうまく機能しません。ここでは、上記の問題に対処し、複数パッチ機能のピラミッドネットワーク（MPFP-Net）を備えた新しいアーキテクチャを提案します。トレーニング中に最も識別力のあるパッチのみを追求する現在のモデルとは異なり、MPFPNetでは、パッチはクラスに関連付けられたサブセットに分割され、パッチは関連しており、一次損失関数に基づいて、一連の滑らかな損失関数が決定されます。小さなオブジェクトパーツを収集するためのモデルを改善するためのサブセット。パッチ選択の特徴表現を強化するために、残差値を正則化し、融合遷移層を厳密にノルム保存にする効果的な方法を紹介します。ネットワークには、ボトムアップ接続とクロスワイズ接続が含まれており、さまざまなスケールの機能を融合して、いくつかの最先端のオブジェクト検出モデルと比較して、より高い精度を実現します。また、開発されたアーキテクチャはベースラインよりも効率的です。

Object detection is a challenging task in remote sensing because objects only occupy a few pixels in the images, and the models are required to simultaneously learn object locations and detection. Even though the established approaches well perform for the objects of regular sizes, they achieve weak performance when analyzing small ones or getting stuck in the local minima (e.g. false object parts). Two possible issues stand in their way. First, the existing methods struggle to perform stably on the detection of small objects because of the complicated background. Second, most of the standard methods used hand-crafted features, and do not work well on the detection of objects parts of which are missing. We here address the above issues and propose a new architecture with a multiple patch feature pyramid network (MPFP-Net). Different from the current models that during training only pursue the most discriminative patches, in MPFPNet the patches are divided into class-affiliated subsets, in which the patches are related and based on the primary loss function, a sequence of smooth loss functions are determined for the subsets to improve the model for collecting small object parts. To enhance the feature representation for patch selection, we introduce an effective method to regularize the residual values and make the fusion transition layers strictly norm-preserving. The network contains bottom-up and crosswise connections to fuse the features of different scales to achieve better accuracy, compared to several state-of-the-art object detection models. Also, the developed architecture is more efficient than the baselines.

updated: Wed Aug 18 2021 09:25:39 GMT+0000 (UTC)

published: Wed Aug 18 2021 09:25:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト