Task Specific Attention is one more thing you need for object detection

Sang Yon Lee

タスク固有の注意は、オブジェクト検出に必要なもう1つのことです

物体検出を実行するために、さまざまなモデルが提案されています。ただし、ほとんどの場合、優れたパフォーマンスを発揮するには、アンカーや非最大抑制（NMS）などの多くの手動コンポーネントが必要です。これらの問題を軽減するために、TransformerベースのDETRとそのバリアントであるDeformableDETRが提案されました。これらは、オブジェクト検出モデルのヘッドを設計する際の複雑な問題の多くを解決しました。ただし、アンカーとNMSに依存する他のモデルのオブジェクト検出における最先端の方法として、Transformerベースのモデルを検討する場合、パフォーマンスに関する疑問が依然として存在し、より良い結果が明らかになりました。さらに、DETRに適応したTransformerメソッドはバックボーン本体に畳み込みニューラルネットワーク（CNN）を使用したため、アテンションモジュールとのみ組み合わせてエンドツーエンドのパイプラインを構築できるかどうかは不明でした。この研究では、いくつかのアテンションモジュールを新しいタスク固有のスプリットトランスフォーマー（TSST）と組み合わせることで、従来の手作業で設計されたコンポーネントを使用せずに、COCO結果で最先端のパフォーマンスを実現する強力な方法を提案します。汎用注意モジュールを2つの別々の目標固有の注意モジュールに分割することにより、提案された方法は、より単純な物体検出モデルの設計を可能にします。 COCOベンチマークに関する広範な実験は、私たちのアプローチの有効性を示しています。コードはhttps://github.com/navervision/tsstで入手できます

Various models have been proposed to perform object detection. However, most require many handdesigned components such as anchors and non-maximum-suppression(NMS) to demonstrate good performance. To mitigate these issues, Transformer-based DETR and its variant, Deformable DETR, were suggested. These have solved much of the complex issue in designing a head for object detection models; however, doubts about performance still exist when considering Transformer-based models as state-of-the-art methods in object detection for other models depending on anchors and NMS revealed better results. Furthermore, it has been unclear whether it would be possible to build an end-to-end pipeline in combination only with attention modules, because the DETR-adapted Transformer method used a convolutional neural network (CNN) for the backbone body. In this study, we propose that combining several attention modules with our new Task Specific Split Transformer (TSST) is a powerful method to produce the state-of-the art performance on COCO results without traditionally hand-designed components. By splitting the general-purpose attention module into two separated goal-specific attention modules, the proposed method allows for the design of simpler object detection models. Extensive experiments on the COCO benchmark demonstrate the effectiveness of our approach. Code is available at https://github.com/navervision/tsst

updated: Wed Jun 15 2022 04:02:27 GMT+0000 (UTC)

published: Fri Feb 18 2022 07:09:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト