DEYO: DETR with YOLO for Step-by-Step Object Detection

Haodong Ouyang

DEYO: ステップバイステップのオブジェクト検出のための YOLO を使用した DETR

オブジェクト検出は、コンピュータービジョンの重要なトピックです。後処理は、典型的なオブジェクト検出パイプラインの重要な部分であり、従来のオブジェクト検出モデルのパフォーマンスに影響を与える重大なボトルネックとなっています。最初のエンドツーエンドのターゲット検出モデルである検出トランスフォーマー (DETR) は、アンカーや非最大抑制 (NMS) などの手動コンポーネントの要件を破棄し、ターゲット検出プロセスを大幅に簡素化します。ただし、ほとんどの従来のオブジェクト検出モデルと比較すると、DETR の収束は非常に遅く、クエリの意味はあいまいです。このように、ステップバイステップの概念に着想を得て、この論文では、DETR with YOLO (DEYO) という名前の新しい 2 段階オブジェクト検出モデルを提案します。これは、上記の問題を解決するために漸進的推論に依存します。 DEYO は、古典的なターゲット検出モデルと DETR のようなモデルをそれぞれ第 1 段階と第 2 段階として含む 2 段階のアーキテクチャです。具体的には、第 1 ステージは高品質のクエリとアンカーフィードを第 2 ステージに提供し、元の DETR モデルと比較して第 2 ステージのパフォーマンスと効率を向上させます。一方、第 2 段階は、第 1 段階の検出器の制限によって引き起こされるパフォーマンスの低下を補償します。 COCO データセットのバックボーンおよびマルチスケール機能として ResNet-50 を利用しながら、DEYO がそれぞれ 12 エポックおよび 36 エポックで 50.6 AP および 52.1 AP を達成することを広範な実験が示しています。最適な DETR のようなモデルである DINO と比較すると、開発された DEYO モデルは、2 つのエポック設定で 1.6 AP と 1.2 AP という大幅なパフォーマンスの向上をもたらします。

Object detection is an important topic in computer vision, with post-processing, an essential part of the typical object detection pipeline, posing a significant bottleneck affecting the performance of traditional object detection models. The detection transformer (DETR), as the first end-to-end target detection model, discards the requirement of manual components like the anchor and non-maximum suppression (NMS), significantly simplifying the target detection process. However, compared with most traditional object detection models, DETR converges very slowly, and a query's meaning is obscure. Thus, inspired by the Step-by-Step concept, this paper proposes a new two-stage object detection model, named DETR with YOLO (DEYO), which relies on a progressive inference to solve the above problems. DEYO is a two-stage architecture comprising a classic target detection model and a DETR-like model as the first and second stages, respectively. Specifically, the first stage provides high-quality query and anchor feeding into the second stage, improving the performance and efficiency of the second stage compared to the original DETR model. Meanwhile, the second stage compensates for the performance degradation caused by the first stage detector's limitations. Extensive experiments demonstrate that DEYO attains 50.6 AP and 52.1 AP in 12 and 36 epochs, respectively, while utilizing ResNet-50 as the backbone and multi-scale features on the COCO dataset. Compared with DINO, an optimal DETR-like model, the developed DEYO model affords a significant performance improvement of 1.6 AP and 1.2 AP in two epoch settings.

updated: Fri Nov 18 2022 22:07:57 GMT+0000 (UTC)

published: Sat Nov 12 2022 06:36:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト