Object Detection with Transformers: A Review

Tahira Shehzadi; Khurram Azeem Hashmi; Didier Stricker; Muhammad Zeshan Afzal

Transformers による物体検出: レビュー

自然言語処理 (NLP) における Transformers の驚異的なパフォーマンスにより、研究者はコンピュータービジョンタスクでの Transformers の利用を検討することに喜びを感じています。他のコンピュータービジョンタスクと同様に、DEtection TRansformer (DETR) は、提案の生成や後処理ステップを必要とせずに、検出をセットの予測問題として考慮することにより、物体検出タスク用のトランスフォーマーを導入します。これは、特に画像内のオブジェクトの数が比較的少ないシナリオでのオブジェクト検出のための最先端 (SOTA) 方法です。 DETR は成功したにもかかわらず、トレーニングの収束が遅く、小さなオブジェクトのパフォーマンスが低下するという問題があります。したがって、これらの問題に対処するために多くの改善が提案されており、DETR の大幅な改良につながります。 2020 年以来、トランスベースの物体検出はますます関心を集めており、優れたパフォーマンスを実証しています。視覚全般に関するトランスフォーマーについては数多くの調査が行われていますが、トランスフォーマーを使用した 2D オブジェクト検出の進歩に関するレビューはまだ行われていません。この文書では、DETR の最近の発展に関する 21 の論文を詳細にレビューします。まず、セルフアテンション、オブジェクトクエリ、入力特徴のエンコードなど、Transformers の基本モジュールから始めます。次に、バックボーンの変更、クエリの設計、アテンションの改良など、DETR の最新の進歩について説明します。また、パフォーマンスとネットワーク設計の観点からすべての検出トランスを比較します。この研究により、物体検出領域での変圧器の適用に向けた既存の課題の解決に対する研究者の関心が高まることを願っています。研究者は、https://github.com/mindgarage-shan/trans_object_detection_survey で利用できるこの Web ページで、検出トランスフォーマーの新しい改善点を追跡できます。

Astounding performance of Transformers in natural language processing (NLP) has delighted researchers to explore their utilization in computer vision tasks. Like other computer vision tasks, DEtection TRansformer (DETR) introduces transformers for object detection tasks by considering the detection as a set prediction problem without needing proposal generation and post-processing steps. It is a state-of-the-art (SOTA) method for object detection, particularly in scenarios where the number of objects in an image is relatively small. Despite the success of DETR, it suffers from slow training convergence and performance drops for small objects. Therefore, many improvements are proposed to address these issues, leading to immense refinement in DETR. Since 2020, transformer-based object detection has attracted increasing interest and demonstrated impressive performance. Although numerous surveys have been conducted on transformers in vision in general, a review regarding advancements made in 2D object detection using transformers is still missing. This paper gives a detailed review of twenty-one papers about recent developments in DETR. We begin with the basic modules of Transformers, such as self-attention, object queries and input features encoding. Then, we cover the latest advancements in DETR, including backbone modification, query design and attention refinement. We also compare all detection transformers in terms of performance and network design. We hope this study will increase the researcher's interest in solving existing challenges towards applying transformers in the object detection domain. Researchers can follow newer improvements in detection transformers on this webpage available at: https://github.com/mindgarage-shan/trans_object_detection_survey

updated: Tue Jun 27 2023 11:07:31 GMT+0000 (UTC)

published: Wed Jun 07 2023 16:13:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト