SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

Jiaqing Zhang; Jie Lei; Weiying Xie; Zhenman Fang; Yunsong Li; Qian Du

SuperYOLO: マルチモーダルリモートセンシング画像における超解像度アシストオブジェクト検出

リモートセンシング画像 (RSI) から数十ピクセルを含むマルチスケールの小さなオブジェクトを正確かつタイムリーに検出することは、依然として困難です。既存のソリューションのほとんどは、主に複雑なディープニューラルネットワークを設計して、背景から分離されたオブジェクトの強力な特徴表現を学習するため、多くの場合、計算負荷が高くなります。この記事では、マルチモーダルデータを融合し、補助超解像 (SR) 学習を利用し、検出精度と計算コスト。まず、対称コンパクトマルチモーダルフュージョン (MF) を利用して、RSI での小さなオブジェクトの検出を改善するために、さまざまなデータから補足情報を抽出します。さらに、シンプルで柔軟な SR ブランチを設計して、低解像度 (LR) 入力で広大な背景から小さなオブジェクトを区別できる HR 特徴表現を学習し、検出精度をさらに向上させます。さらに、追加の計算の導入を避けるために、SR ブランチは推論段階で破棄され、LR 入力によりネットワークモデルの計算が削減されます。実験結果は、広く使用されている VEDAI RS データセットで、SuperYOLO が 75.09% (mAP50 に関して) の精度を達成することを示しています。これは、YOLOv5l、YOLOv5x、および RS 設計の YOLOrs などの SOTA 大規模モデルよりも 10% 以上高いです。 .一方、SuperYOLO のパラメータサイズと GFLOP は、YOLOv5x の約 18 倍と 3.8 分の 1 です。提案されたモデルは、最先端のモデルと比較して、精度と速度のトレードオフが良好であることを示しています。コードは、https://github.com/icey-zhang/SuperYOLO でオープンソース化されます。

Accurately and timely detecting multiscale small objects that contain tens of pixels from remote sensing images (RSI) remains challenging. Most of the existing solutions primarily design complex deep neural networks to learn strong feature representations for objects separated from the background, which often results in a heavy computation burden. In this article, we propose an accurate yet fast object detection method for RSI, named SuperYOLO, which fuses multimodal data and performs high-resolution (HR) object detection on multiscale objects by utilizing the assisted super resolution (SR) learning and considering both the detection accuracy and computation cost. First, we utilize a symmetric compact multimodal fusion (MF) to extract supplementary information from various data for improving small object detection in RSI. Furthermore, we design a simple and flexible SR branch to learn HR feature representations that can discriminate small objects from vast backgrounds with low-resolution (LR) input, thus further improving the detection accuracy. Moreover, to avoid introducing additional computation, the SR branch is discarded in the inference stage, and the computation of the network model is reduced due to the LR input. Experimental results show that, on the widely used VEDAI RS dataset, SuperYOLO achieves an accuracy of 75.09% (in terms of mAP50 ), which is more than 10% higher than the SOTA large models, such as YOLOv5l, YOLOv5x, and RS designed YOLOrs. Meanwhile, the parameter size and GFLOPs of SuperYOLO are about 18 times and 3.8 times less than YOLOv5x. Our proposed model shows a favorable accuracy and speed tradeoff compared to the state-of-the-art models. The code will be open-sourced at https://github.com/icey-zhang/SuperYOLO.

updated: Sat Apr 08 2023 09:50:26 GMT+0000 (UTC)

published: Tue Sep 27 2022 12:58:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト