Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos

Xingyu Chen; Junzhi Yu; Shihan Kong; Zhengxing Wu; Li Wen

画像およびビデオでのリアルタイムの正確なオブジェクト検出のためのアンカー機能の統合改善

オブジェクト検出は長年にわたって精力的に調査されてきましたが、現実世界のシーンの高速で正確な検出は依然として非常に難しい問題です。シングルステージ検出器の欠点を克服し、静的および一時的なシーンのオブジェクトをリアルタイムで正確に検出することを目指しています。第一に、二重改良メカニズムとして、アンカーの改良、特徴位置の改良、および変形可能な検出ヘッドを含む、新規のアンカーオフセット検出が設計されています。この新しい検出モードでは、2段階の回帰を同時に実行し、正確なオブジェクト機能をキャプチャできます。アンカーオフセット検出に基づいて、高性能な静的検出のためのデュアルリファインメントネットワーク（DRNet）が開発されます。ここでは、オブジェクトを記述するためのコンテキスト情報を活用するためにマルチ変形可能なヘッドがさらに設計されます。ビデオの時間的検出に関しては、時間的改良ネットワーク（TRNet）および時間的二重改良ネットワーク（TDRNet）は、時間をかけて改良情報を伝播することにより開発されます。また、オブジェクトの動きを以前の洗練と一時的に一致させるためのソフトな洗練戦略も提案します。提案された方法は、PASCAL VOC、COCO、およびImageNet VIDデータセットで評価されます。静的および一時的な検出に関する広範な比較により、DRNet、TRNet、およびTDRNetの優位性が検証されます。その結果、開発されたアプローチはかなり高速で実行され、その間、検出精度が大幅に向上しました。つまり、VOC 2007で84.4％mAP、VOC 2012で83.6％mAP、VID 2017で69.4％mAP、42.4％AP COCO。最終的には、有望な結果を生み出し、当社の方法はオンラインの水中物体の検出と自律システムによる把握に適用されます。コードはhttps://github.com/SeanChenxy/TDRNで公開されています。

Object detection has been vigorously investigated for years but fast accurate detection for real-world scenes remains a very challenging problem. Overcoming drawbacks of single-stage detectors, we take aim at precisely detecting objects for static and temporal scenes in real time. Firstly, as a dual refinement mechanism, a novel anchor-offset detection is designed, which includes an anchor refinement, a feature location refinement, and a deformable detection head. This new detection mode is able to simultaneously perform two-step regression and capture accurate object features. Based on the anchor-offset detection, a dual refinement network (DRNet) is developed for high-performance static detection, where a multi-deformable head is further designed to leverage contextual information for describing objects. As for temporal detection in videos, temporal refinement networks (TRNet) and temporal dual refinement networks (TDRNet) are developed by propagating the refinement information across time. We also propose a soft refinement strategy to temporally match object motion with the previous refinement. Our proposed methods are evaluated on PASCAL VOC, COCO, and ImageNet VID datasets. Extensive comparisons on static and temporal detection verify the superiority of DRNet, TRNet, and TDRNet. Consequently, our developed approaches run in a fairly fast speed, and in the meantime achieve a significantly enhanced detection accuracy, i.e., 84.4% mAP on VOC 2007, 83.6% mAP on VOC 2012, 69.4% mAP on VID 2017, and 42.4% AP on COCO. Ultimately, producing encouraging results, our methods are applied to online underwater object detection and grasping with an autonomous system. Codes are publicly available at https://github.com/SeanChenxy/TDRN.

updated: Fri Mar 13 2020 15:41:01 GMT+0000 (UTC)

published: Mon Jul 23 2018 14:29:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト