DPNET: Dual-Path Network for Efficient Object Detectioj with Lightweight Self-Attention

Huimin Shi; Quan Zhou; Yinghao Ni; Xiaofu Wu; Longin Jan Latecki

DPNET：軽量の自己注意を備えた効率的なオブジェクト検出のためのデュアルパスネットワーク

多くの場合、オブジェクト検出は、満足のいくパフォーマンスを得るためにかなりの量の計算を必要とします。これは、エッジデバイスに展開するのには不向きです。計算コストと検出精度の間のトレードオフに対処するために、このペーパーでは、軽量の自己注意を備えた効率的なオブジェクト検出のための、DPNetという名前のデュアルパスネットワークを紹介します。バックボーンでは、単一の入力/出力軽量自己注意モジュール（LSAM）が、異なる位置間のグローバルな相互作用をエンコードするように設計されています。 LSAMは、機能ピラミッドネットワーク（FPN）の複数入力バージョンにも拡張されています。これは、2つのパスでクロスレゾリューションの依存関係をキャプチャするために使用されます。 COCOデータセットでの広範な実験は、私たちの方法が最先端の検出結果を達成することを示しています。より具体的には、DPNetはCOCO test-devで29.0％のAPを取得し、320x320イメージのモデルサイズはわずか1.14GFLOPおよび227Mです。

Object detection often costs a considerable amount of computation to get satisfied performance, which is unfriendly to be deployed in edge devices. To address the trade-off between computational cost and detection accuracy, this paper presents a dual path network, named DPNet, for efficient object detection with lightweight self-attention. In backbone, a single input/output lightweight self-attention module (LSAM) is designed to encode global interactions between different positions. LSAM is also extended into a multiple-inputs version in feature pyramid network (FPN), which is employed to capture cross-resolution dependencies in two paths. Extensive experiments on the COCO dataset demonstrate that our method achieves state-of-the-art detection results. More specifically, DPNet obtains 29.0% AP on COCO test-dev, with only 1.14 GFLOPs and 2.27M model size for a 320x320 image.

updated: Sun Oct 31 2021 13:38:16 GMT+0000 (UTC)

published: Sun Oct 31 2021 13:38:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト