Decoupled Self Attention for Accurate One Stage Object Detection

Kehe WUa; Zuge Chena; Qi MAb; Xiaoliang Zhanga; Wei Lia

正確な1段階の物体検出のための分離された自己注意

物体検出データセットの規模が画像認識データセットImageNetの規模よりも小さいため、転移学習は深層学習物体検出モデルの基本的なトレーニング方法になりました。これにより、ImageNetデータセット上の物体検出モデルのバックボーンネットワークを事前トレーニングして、分類用の特徴を抽出します。およびローカリゼーションサブタスク。ただし、分類タスクはオブジェクトの顕著な領域の特徴に焦点を合わせ、位置特定タスクはオブジェクトのエッジの特徴に焦点を合わせるため、事前トレーニングされたバックボーンネットワークによって抽出された特徴とローカリゼーションタスクに使用される特徴の間には一定の偏差があります。この問題を解決するために、この論文では、1段階の物体検出モデルに対して分離自己注意（DSA）モジュールを提案します。 DSAには、2つの分離された自己注意ブランチが含まれているため、さまざまなタスクに適切な機能を抽出できます。これは、FPNとサブタスクのヘッドネットワークの間に配置されているため、さまざまなタスクのFPN融合特徴に基づいてグローバル特徴を個別に抽出するために使用されます。 DSAモジュールのネットワークは単純ですが、オブジェクト検出のパフォーマンスを効果的に向上させることができますが、多くの検出モデルに簡単に組み込むこともできます。私たちの実験は、代表的な1段階の検出モデルRetinaNetに基づいています。 COCOデータセットでは、ResNet50とResNet101をバックボーンネットワークとして使用すると、検出パフォーマンスをそれぞれ0.4％APと0.5％AP向上させることができます。 DSAモジュールとオブジェクト信頼性タスクをRetinaNetに一緒に適用すると、ResNet50とResNet101に基づく検出パフォーマンスをそれぞれ1.0％APと1.4％AP向上させることができます。実験結果は、DSAモジュールの有効性を示しています。

As the scale of object detection dataset is smaller than that of image recognition dataset ImageNet, transfer learning has become a basic training method for deep learning object detection models, which will pretrain the backbone network of object detection model on ImageNet dataset to extract features for classification and localization subtasks. However, the classification task focuses on the salient region features of object, while the location task focuses on the edge features of object, so there is certain deviation between the features extracted by pretrained backbone network and the features used for localization task. In order to solve this problem, a decoupled self attention(DSA) module is proposed for one stage object detection models in this paper. DSA includes two decoupled self-attention branches, so it can extract appropriate features for different tasks. It is located between FPN and head networks of subtasks, so it is used to extract global features based on FPN fused features for different tasks independently. Although the network of DSA module is simple, but it can effectively improve the performance of object detection, also it can be easily embedded in many detection models. Our experiments are based on the representative one-stage detection model RetinaNet. In COCO dataset, when ResNet50 and ResNet101 are used as backbone networks, the detection performances can be increased by 0.4% AP and 0.5% AP respectively. When DSA module and object confidence task are applied in RetinaNet together, the detection performances based on ResNet50 and ResNet101 can be increased by 1.0% AP and 1.4% AP respectively. The experiment results show the effectiveness of DSA module.

updated: Mon Dec 14 2020 15:19:30 GMT+0000 (UTC)

published: Mon Dec 14 2020 15:19:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト