Vision Transformer Adapter for Dense Predictions

Zhe Chen; Yuchen Duan; Wenhai Wang; Junjun He; Tong Lu; Jifeng Dai; Yu Qiao

密な予測のためのビジョントランスフォーマーアダプター

この作業では、ビジョントランスフォーマー (ViT) のシンプルかつ強力な高密度予測タスクアダプターを調査します。ビジョン固有の誘導バイアスをアーキテクチャに組み込んだ最近の高度なバリアントとは異なり、単純な ViT は、事前の仮定が弱いため、高密度の予測でパフォーマンスが低下します。この問題に対処するために、ViT-Adapter を提案します。これにより、プレーンな ViT がビジョン固有のトランスフォーマーに匹敵するパフォーマンスを実現できます。具体的には、私たちのフレームワークのバックボーンは、大規模なマルチモーダルデータから強力な表現を学習できるプレーンな ViT です。ダウンストリームタスクに転送する場合、事前トレーニング不要のアダプターを使用して、画像関連の誘導バイアスをモデルに導入し、これらのタスクに適したものにします。オブジェクト検出、インスタンスセグメンテーション、セマンティックセグメンテーションなど、複数の高密度予測タスクで ViT-Adapter を検証します。特に、追加の検出データを使用せずに、当社の ViT-Adapter-L は、COCO テスト開発で最先端の 60.9 ボックス AP と 53.0 マスク AP を生成します。 ViT-Adapter がビジョン固有のトランスフォーマーの代替として機能し、将来の研究を促進することを願っています。コードとモデルは https://github.com/czczup/ViT-Adapter でリリースされます。

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT). Unlike recently advanced variants that incorporate vision-specific inductive biases into their architectures, the plain ViT suffers inferior performance on dense predictions due to weak prior assumptions. To address this issue, we propose the ViT-Adapter, which allows plain ViT to achieve comparable performance to vision-specific transformers. Specifically, the backbone in our framework is a plain ViT that can learn powerful representations from large-scale multi-modal data. When transferring to downstream tasks, a pre-training-free adapter is used to introduce the image-related inductive biases into the model, making it suitable for these tasks. We verify ViT-Adapter on multiple dense prediction tasks, including object detection, instance segmentation, and semantic segmentation. Notably, without using extra detection data, our ViT-Adapter-L yields state-of-the-art 60.9 box AP and 53.0 mask AP on COCO test-dev. We hope that the ViT-Adapter could serve as an alternative for vision-specific transformers and facilitate future research. The code and models will be released at https://github.com/czczup/ViT-Adapter.

updated: Sun Oct 23 2022 07:39:25 GMT+0000 (UTC)

published: Tue May 17 2022 17:59:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト