DETRs Beat YOLOs on Real-time Object Detection

Wenyu Lv; Yian Zhao; Shangliang Xu; Jinman Wei; Guanzhong Wang; Cheng Cui; Yuning Du; Qingqing Dang; Yi Liu

DETR はリアルタイムの物体検出で YOLO を上回る

最近、エンドツーエンドの変圧器ベースの検出器 (DETR) が目覚ましい性能を達成しました。しかし、DETR の計算コストが高いという問題は効果的に対処されておらず、そのため実際の応用が制限され、非最大抑制 (NMS) などの後処理がない利点を十分に活用することができません。この論文では、まず最新のリアルタイム物体検出器における NMS が推論速度に及ぼす影響を分析し、エンドツーエンドの速度ベンチマークを確立します。 NMS によって引き起こされる推論遅延を回避するために、私たちは、知る限り初のリアルタイムエンドツーエンドオブジェクト検出器である Real-Time DEtection TRansformer (RT-DETR) を提案します。具体的には、スケール内の相互作用とスケール間の融合を分離することでマルチスケールの特徴を効率的に処理する効率的なハイブリッドエンコーダーを設計し、オブジェクトクエリの初期化を改善するために IoU を認識したクエリ選択を提案します。さらに、提案した検出器は、再学習を必要とせずに、異なるデコーダ層を使用することで推論速度の柔軟な調整をサポートし、リアルタイム物体検出器の実用化を容易にします。当社の RT-DETR-L は、COCO val2017 で 53.0% の AP と T4 GPU で 114 FPS を達成し、RT-DETR-X は 54.8% の AP と 74 FPS を達成し、速度と精度の両方で同規模のすべての YOLO 検出器を上回っています。さらに、当社の RT-DETR-R50 は 53.1% AP および 108 FPS を達成し、DINO-Deformable-DETR-R50 よりも精度で 2.2% AP、FPS で約 21 倍優れています。コードと事前トレーニングされたモデルは https://github.com/lyuwenyu/RT-DETR で入手できます。

Recently, end-to-end transformer-based detectors~(DETRs) have achieved remarkable performance. However, the issue of the high computational cost of DETRs has not been effectively addressed, limiting their practical application and preventing them from fully exploiting the benefits of no post-processing, such as non-maximum suppression (NMS). In this paper, we first analyze the influence of NMS in modern real-time object detectors on inference speed, and establish an end-to-end speed benchmark. To avoid the inference delay caused by NMS, we propose a Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge. Specifically, we design an efficient hybrid encoder to efficiently process multi-scale features by decoupling the intra-scale interaction and cross-scale fusion, and propose IoU-aware query selection to improve the initialization of object queries. In addition, our proposed detector supports flexibly adjustment of the inference speed by using different decoder layers without the need for retraining, which facilitates the practical application of real-time object detectors. Our RT-DETR-L achieves 53.0% AP on COCO val2017 and 114 FPS on T4 GPU, while RT-DETR-X achieves 54.8% AP and 74 FPS, outperforming all YOLO detectors of the same scale in both speed and accuracy. Furthermore, our RT-DETR-R50 achieves 53.1% AP and 108 FPS, outperforming DINO-Deformable-DETR-R50 by 2.2% AP in accuracy and by about 21 times in FPS. ource code and pre-trained models are available at https://github.com/lyuwenyu/RT-DETR.

updated: Thu Jul 06 2023 09:42:54 GMT+0000 (UTC)

published: Mon Apr 17 2023 08:30:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト