Deformable DETR: Deformable Transformers for End-to-End Object Detection

Xizhou Zhu; Weijie Su; Lewei Lu; Bin Li; Xiaogang Wang; Jifeng Dai

変形可能なDETR：エンドツーエンドのオブジェクト検出のための変形可能なトランスフォーマー

DETRは最近、優れたパフォーマンスを示しながら、オブジェクト検出で多くの手動コンポーネントの必要性を排除するために提案されました。ただし、画像の特徴マップを処理する際のTransformerアテンションモジュールの制限により、収束が遅く、特徴の空間解像度が制限されるという問題があります。これらの問題を軽減するために、Deformable DETRを提案しました。このDETRの注意モジュールは、参照の周りの主要なサンプリングポイントの小さなセットにのみ対応します。変形可能なDETRは、トレーニングエポックが10分の1で、DETR（特に小さなオブジェクト）よりも優れたパフォーマンスを実現できます。 COCOベンチマークに関する広範な実験は、私たちのアプローチの有効性を示しています。コードはhttps://github.com/fundamentalvision/Deformable-DETRでリリースされています。

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance. However, it suffers from slow convergence and limited feature spatial resolution, due to the limitation of Transformer attention modules in processing image feature maps. To mitigate these issues, we proposed Deformable DETR, whose attention modules only attend to a small set of key sampling points around a reference. Deformable DETR can achieve better performance than DETR (especially on small objects) with 10 times less training epochs. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is released at https://github.com/fundamentalvision/Deformable-DETR.

updated: Thu Mar 18 2021 03:14:26 GMT+0000 (UTC)

published: Thu Oct 08 2020 17:59:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト