SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency

Yang Liu; Yao Zhang; Yixin Wang; Yang Zhang; Jiang Tian; Zhongchao Shi; Jianping Fan; Zhiqiang He

SAP-DETR: 顕著な点とクエリベースの変換検出器の間のギャップを埋めてモデルを高速に収束させる

最近、主要な DETR ベースのアプローチは、Transformer 検出器の収束を加速する前に中央概念の空間を適用します。これらのメソッドは、参照ポイントをターゲットオブジェクトの中心に徐々に絞り込み、オブジェクトクエリに更新された中心参照情報を吹き込み、空間的に条件付きの注意を喚起します。ただし、参照ポイントを集中化すると、クエリの顕著性が大幅に低下し、無差別な空間事前分布により検出器が混乱する可能性があります。顕著なクエリの基準点と Transformer 検出器の間のギャップを埋めるために、オブジェクト検出を顕著な点からインスタンスオブジェクトへの変換として扱うことにより、SAlient Point-based DETR (SAP-DETR) を提案します。 SAP-DETR では、オブジェクトクエリごとにクエリ固有の参照ポイントを明示的に初期化し、それらをインスタンスオブジェクトに徐々に集約してから、境界ボックスの両側からこれらのポイントまでの距離を予測します。 SAP-DETR は、画像の特徴からクエリ固有の参照領域やその他の条件付きの極端な領域に迅速に対応することで、顕著な収束速度で、顕著な点とクエリベースの Transformer 検出器の間のギャップを効果的に埋めることができます。当社の広範な実験により、SAP-DETR が 1.4 倍の収束速度と競合するパフォーマンスを達成することが実証されました。標準的なトレーニングスキームの下で、SAP-DETR は 1.0 AP までに SOTA アプローチを安定的に推進します。 ResNet-DC-101 に基づくと、SAP-DETR は 46.9 AP を達成します。

Recently, the dominant DETR-based approaches apply central-concept spatial prior to accelerate Transformer detector convergency. These methods gradually refine the reference points to the center of target objects and imbue object queries with the updated central reference information for spatially conditional attention. However, centralizing reference points may severely deteriorate queries' saliency and confuse detectors due to the indiscriminative spatial prior. To bridge the gap between the reference points of salient queries and Transformer detectors, we propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects. In SAP-DETR, we explicitly initialize a query-specific reference point for each object query, gradually aggregate them into an instance object, and then predict the distance from each side of the bounding box to these points. By rapidly attending to query-specific reference region and other conditional extreme regions from the image features, SAP-DETR can effectively bridge the gap between the salient point and the query-based Transformer detector with a significant convergency speed. Our extensive experiments have demonstrated that SAP-DETR achieves 1.4 times convergency speed with competitive performance. Under the standard training scheme, SAP-DETR stably promotes the SOTA approaches by 1.0 AP. Based on ResNet-DC-101, SAP-DETR achieves 46.9 AP.

updated: Thu Nov 03 2022 17:20:55 GMT+0000 (UTC)

published: Thu Nov 03 2022 17:20:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト