DAMO-YOLO : A Report on Real-Time Object Detection Design

Xianzhe Xu; Yiqi Jiang; Weihua Chen; Yilun Huang; Yuan Zhang; Xiuyu Sun

DAMO-YOLO : リアルタイム物体検出設計に関するレポート

このレポートでは、最先端のYOLOシリーズよりも高いパフォーマンスを実現するDAMO-YOLOと呼ばれる高速で正確な物体検出方法を紹介します。 DAMO-YOLO は YOLO から拡張され、Neural Architecture Search (NAS)、効率的な Reparameterized Generalized-FPN (RepGFPN)、AlignedOTA ラベル割り当てを備えた軽量ヘッド、蒸留強化などのいくつかの新しいテクノロジーが追加されています。特に、最大エントロピーの原理に基づいた方法である MAE-NAS を使用して、低レイテンシと高性能という制約の下で検出バックボーンを検索し、空間ピラミッドプーリングとフォーカスモジュールを備えた ResNet/CSP のような構造を生成します。ネックとヘッドの設計では、「大きなネック、小さなヘッド」のルールに従います。検出器のネックを構築し、その CSPNet を効率的なレイヤーアグリゲーションネットワーク (ELAN) でアップグレードするために、高速化されたクイーンフュージョンを備えた Generalized-FPN をインポートします。および再パラメータ化。次に、検出器ヘッドのサイズが検出性能にどのように影響するかを調査し、タスクプロジェクションレイヤーが 1 つだけの重いネックがより良い結果をもたらすことを発見しました。また、パフォーマンスをより高いレベルに向上させるために、蒸留スキーマが導入されています。これらの新しい技術に基づいて、さまざまなシナリオのニーズを満たすために、さまざまな規模で一連のモデルを構築します。一般的な業界要件については、DAMO-YOLO-T/S/M/L を提案します。 T4 GPU でそれぞれ 2.78/3.83/5.62/7.95 ミリ秒のレイテンシで、COCO で 43.6/47.7/50.2/51.9 mAP を達成できます。さらに、計算能力が限られているエッジデバイス向けに、DAMO-YOLO-Ns/Nm/Nl 軽量モデルも提案しています。 X86-CPU で 4.08/5.05/6.69 ミリ秒のレイテンシで、COCO で 32.3/38.2/40.5 mAP を達成できます。私たちが提案する一般的で軽量なモデルは、それぞれのアプリケーションシナリオで他の YOLO シリーズモデルよりも優れています。

In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.

updated: Mon Apr 24 2023 03:32:15 GMT+0000 (UTC)

published: Wed Nov 23 2022 17:59:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト