FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

Rongyao Fang; Peng Gao; Aojun Zhou; Yingjie Cai; Si Liu; Jifeng Dai; Hongsheng Li

FeatAug-DETR: 機能拡張による DETR の 1 対多マッチングの強化

1 対 1 のマッチングは、DETR のようなオブジェクト検出フレームワークの重要な設計です。これにより、DETR がエンドツーエンドの検出を実行できるようになります。ただし、ポジティブなサンプル監視が欠けていることや、収束速度が遅いという課題にも直面しています。最近のいくつかの研究では、トレーニングを加速し、検出パフォーマンスを向上させるために、1 対多のマッチングメカニズムが提案されています。これらのメソッドを再検討し、オブジェクトクエリを拡張する統一された形式でモデル化します。本論文では、画像や画像の特徴を拡張するという別の観点から、1対多のマッチングを実現する2つの方法を提案します。最初の方法は、Data Augmentation (DataAug-DETR と表記) による 1 対多のマッチングです。画像を空間的に変換し、同じトレーニングバッチに各画像の複数の拡張バージョンを含めます。このような単純な拡張戦略により、すでに 1 対多のマッチングが達成されており、驚くほど DETR のパフォーマンスが向上しています。 2 番目の方法は、Feature Augmentation (FeatAug-DETR と表記) による 1 対多のマッチングです。 DataAug-DETR とは異なり、元の画像の代わりに画像の特徴を拡張し、同じバッチに複数の拡張機能を含めて、1 対多のマッチングを実現します。 FeatAug-DETR は、DETR トレーニングを大幅に高速化し、推論速度を変更せずに検出パフォーマンスを向上させます。 DAB-DETR、Deformable-DETR、および H-Deformable-DETR を含む DETR バリアントに対する提案されたアプローチの有効性を評価するために、広範な実験を行います。追加のトレーニングデータがなければ、FeatAug-DETR は Deformable-DETR のトレーニング収束期間を 24 エポックに短縮し、Swin-L をバックボーンとして設定された COCO val2017 で 58.3 AP を達成します。

One-to-one matching is a crucial design in DETR-like object detection frameworks. It enables the DETR to perform end-to-end detection. However, it also faces challenges of lacking positive sample supervision and slow convergence speed. Several recent works proposed the one-to-many matching mechanism to accelerate training and boost detection performance. We revisit these methods and model them in a unified format of augmenting the object queries. In this paper, we propose two methods that realize one-to-many matching from a different perspective of augmenting images or image features. The first method is One-to-many Matching via Data Augmentation (denoted as DataAug-DETR). It spatially transforms the images and includes multiple augmented versions of each image in the same training batch. Such a simple augmentation strategy already achieves one-to-many matching and surprisingly improves DETR's performance. The second method is One-to-many matching via Feature Augmentation (denoted as FeatAug-DETR). Unlike DataAug-DETR, it augments the image features instead of the original images and includes multiple augmented features in the same batch to realize one-to-many matching. FeatAug-DETR significantly accelerates DETR training and boosts detection performance while keeping the inference speed unchanged. We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and H-Deformable-DETR. Without extra training data, FeatAug-DETR shortens the training convergence periods of Deformable-DETR to 24 epochs and achieves 58.3 AP on COCO val2017 set with Swin-L as the backbone.

updated: Thu Mar 02 2023 18:59:48 GMT+0000 (UTC)

published: Thu Mar 02 2023 18:59:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト