End-to-End Object Detection with Adaptive Clustering Transformer

Minghang Zheng; Peng Gao; Xiaogang Wang; Hongsheng Li; Hao Dong

アダプティブクラスタリングトランスフォーマーによるエンドツーエンドのオブジェクト検出

Transformerを使用したエンドツーエンドのオブジェクト検出（DETR）は、Transformerを使用してオブジェクト検出を実行し、Faster-RCNNのような2段階のオブジェクト検出と同等のパフォーマンスを実現することを提案します。ただし、DETRは、高解像度の空間入力のため、トレーニングと推論のために膨大な計算リソースを必要とします。この論文では、高解像度入力の計算コストを削減するために、Adaptive Clustering Transformer（ACT）という名前のトランスフォーマーの新しいバリアントが提案されています。 ACTは、局所性鋭敏型ハッシュ（LSH）を使用してクエリ機能を適応的にクラスター化し、プロトタイプとキーの相互作用を使用してクエリとキーの相互作用を近似します。 ACTは、自己注意の中の2次O（N2）の複雑さをO（NK）に減らすことができます。ここで、Kは各層のプロトタイプの数です。 ACTは、トレーニングなしで元の自己注意モジュールを置き換えるドロップインモジュールにすることができます。 ACTは、精度と計算コスト（FLOP）のバランスをうまく取ります。このコードは、実験の複製と検証を容易にするための補足として利用できます。

End-to-end Object Detection with Transformer (DETR)proposes to perform object detection with Transformer and achieve comparable performance with two-stage object detection like Faster-RCNN. However, DETR needs huge computational resources for training and inference due to the high-resolution spatial input. In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input. ACT cluster the query features adaptively using Locality Sensitive Hashing (LSH) and ap-proximate the query-key interaction using the prototype-key interaction. ACT can reduce the quadratic O(N2) complexity inside self-attention into O(NK) where K is the number of prototypes in each layer. ACT can be a drop-in module replacing the original self-attention module without any training. ACT achieves a good balance between accuracy and computation cost (FLOPs). The code is available as supplementary for the ease of experiment replication and verification.

updated: Wed Nov 18 2020 14:36:37 GMT+0000 (UTC)

published: Wed Nov 18 2020 14:36:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト