KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer

Kaikai Zhao; Norimichi Ukita

KS-DETR: 検出トランスの注意学習における知識共有

スケーリングされたドット積アテンションは、クエリとキーのスケーリングされたドット積にソフトマックス関数を適用して重みを計算し、重みと値を乗算します。この作業では、スケーリングされた内積注意の学習を改善して DETR の精度を向上させる方法を研究します。私たちの方法は、次の観察に基づいています。重み/値学習の追加の手がかりとしてグラウンドトゥルース前景背景マスク (GT Fg-Bg マスク) を使用すると、はるかに優れた重み/値を学習できます。より良い重み/値を使用すると、より良い値/重みを学習できます。最初の注意が単純なスケーリングされたドット積の注意であり、2番目/3番目の注意が高品質の重み/値を生成し(GT Fg-Bgマスクの助けを借りて)、値/重みを共有するトリプル注意モジュールを提案します値/重みの品質を改善するために最初に注意を払います。 2 番目と 3 番目の注意は、推論中に削除されます。私たちはこの方法を知識共有 DETR (KS-DETR) と呼びます。これは、教師の改善された重みと値 (2 番目と 3 番目の注意) が模倣されるのではなく、直接共有されるという方法で知識蒸留 (KD) を拡張したものです。、生徒による（最初の注意）教師から生徒へのより効率的な知識の伝達を可能にします。さまざまな DETR に似た方法の実験では、MS COCO ベンチマークのベースライン方法よりも一貫した改善が見られます。コードは https://github.com/edocanonymous/KS-DETR で入手できます。

Scaled dot-product attention applies a softmax function on the scaled dot-product of queries and keys to calculate weights and then multiplies the weights and values. In this work, we study how to improve the learning of scaled dot-product attention to improve the accuracy of DETR. Our method is based on the following observations: using ground truth foreground-background mask (GT Fg-Bg Mask) as additional cues in the weights/values learning enables learning much better weights/values; with better weights/values, better values/weights can be learned. We propose a triple-attention module in which the first attention is a plain scaled dot-product attention, the second/third attention generates high-quality weights/values (with the assistance of GT Fg-Bg Mask) and shares the values/weights with the first attention to improve the quality of values/weights. The second and third attentions are removed during inference. We call our method knowledge-sharing DETR (KS-DETR), which is an extension of knowledge distillation (KD) in the way that the improved weights and values of the teachers (the second and third attentions) are directly shared, instead of mimicked, by the student (the first attention) to enable more efficient knowledge transfer from the teachers to the student. Experiments on various DETR-like methods show consistent improvements over the baseline methods on the MS COCO benchmark. Code is available at https://github.com/edocanonymous/KS-DETR.

updated: Wed Feb 22 2023 08:48:08 GMT+0000 (UTC)

published: Wed Feb 22 2023 08:48:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト