QKVA grid: Attention in Image Perspective and Stacked DETR

Wenyuan Sheng

QKVA グリッド: イメージパースペクティブとスタック DETR での注意

標準的な DETR の主なアイデアを継承する Stacked-DETR(SDETR) という名前の新しいモデルを提示します。 DETR を 2 つの方向で改善します。トレーニングのコストを簡素化し、スタックアーキテクチャを導入してパフォーマンスを向上させます。前者に対しては、注意ブロックの内部に注目し、注意のプロセスを記述する新しい視点である QKVA グリッドを提案します。これにより、画像の問題とマルチヘッドの効果に対して注意がどのように機能するかについて、さらに一歩進めることができます。これら 2 つのアイデアは、シングルヘッドエンコーダーレイヤーの設計に貢献します。後者に対して、SDETR は DETR よりも優れたパフォーマンス (+0.6AP、+2.7AP) に達します。特に小さなオブジェクトのパフォーマンスに関しては、SDETR は最適化された Faster R-CNN ベースラインよりも優れた結果を達成しますが、これは DETR の欠点でした。私たちの変更は DETR のコードに基づいています。トレーニングコードと事前トレーニング済みモデルは、https://github.com/shengwenyuan/sdetr で入手できます。

We present a new model named Stacked-DETR(SDETR), which inherits the main ideas in canonical DETR. We improve DETR in two directions: simplifying the cost of training and introducing the stacked architecture to enhance the performance. To the former, we focus on the inside of the Attention block and propose the QKVA grid, a new perspective to describe the process of attention. By this, we can step further on how Attention works for image problems and the effect of multi-head. These two ideas contribute the design of single-head encoder-layer. To the latter, SDETR reaches better performance(+0.6AP, +2.7APs) to DETR. Especially to the performance on small objects, SDETR achieves better results to the optimized Faster R-CNN baseline, which was a shortcoming in DETR. Our changes are based on the code of DETR. Training code and pretrained models are available at https://github.com/shengwenyuan/sdetr.

updated: Tue Aug 16 2022 14:42:08 GMT+0000 (UTC)

published: Sat Jul 09 2022 18:05:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト