CvT-ASSD: Convolutional vision-Transformer Based Attentive Single Shot MultiBox Detector

Weiqiang Jin; Hang Yu; Hang Yu

CvT-ASSD：畳み込みビジョン-Transformerベースの注意深いシングルショットマルチボックス検出器

自然言語プロセス（NLP）でのトランスフォーマーからの双方向エンコーダー表現（BERT）の成功により、マルチヘッドアテンショントランスフォーマーはコンピュータービジョン研究（CV）でますます普及しています。ただし、視覚検出やセマンティックセグメンテーションなどの複雑なタスクを提案することは、研究者にとって依然として課題です。 DETRやViT-FRCNNなどの複数のTransformerベースのアーキテクチャがオブジェクト検出タスクを完了するために提案されていますが、それらは必然的に識別精度を低下させ、従来の自己注意操作によって発生する膨大な学習パラメータと重い計算の複雑さによって引き起こされる計算効率を低下させます。これらの問題を軽減するために、畳み込みビジョントランスフォーマー（CvT）と効率的な注意深いシングルショットマルチボックスの上に構築された、畳み込みビジョントランスフォーマーベースの注意深いシングルショットマルチボックス検出器（CvT-ASSD）という名前の新しいオブジェクト検出アーキテクチャを紹介します。検出器（ASSD）。モデルCvT-ASSDが、PASCALVOCやMSCOCOなどの大規模な検出データセットで事前トレーニングされている間、優れたシステム効率とパフォーマンスにつながることを示す包括的な経験的証拠を提供します。コードは、https：//github.com/albert-jin/CvT-ASSDの公開githubリポジトリでリリースされています。

Due to the success of Bidirectional Encoder Representations from Transformers (BERT) in natural language process (NLP), the multi-head attention transformer has been more and more prevalent in computer-vision researches (CV). However, it still remains a challenge for researchers to put forward complex tasks such as vision detection and semantic segmentation. Although multiple Transformer-Based architectures like DETR and ViT-FRCNN have been proposed to complete object detection task, they inevitably decreases discrimination accuracy and brings down computational efficiency caused by the enormous learning parameters and heavy computational complexity incurred by the traditional self-attention operation. In order to alleviate these issues, we present a novel object detection architecture, named Convolutional vision Transformer Based Attentive Single Shot MultiBox Detector (CvT-ASSD), that built on the top of Convolutional vision Transormer (CvT) with the efficient Attentive Single Shot MultiBox Detector (ASSD). We provide comprehensive empirical evidence showing that our model CvT-ASSD can leads to good system efficiency and performance while being pretrained on large-scale detection datasets such as PASCAL VOC and MS COCO. Code has been released on public github repository at https://github.com/albert-jin/CvT-ASSD.

updated: Sun Oct 24 2021 06:45:33 GMT+0000 (UTC)

published: Sun Oct 24 2021 06:45:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト