PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture

Kai Han; Jianyuan Guo; Yehui Tang; Yunhe Wang

PyramidTNT：PyramidアーキテクチャによるTransformer-in-Transformerベースラインの改善

Transformerネットワークは、コンピュータービジョンタスクで大きな進歩を遂げました。 Transformer-in-Transformer（TNT）アーキテクチャは、内部トランスと外部トランスを利用して、ローカル表現とグローバル表現の両方を抽出します。この作業では、2つの高度な設計を導入することによって新しいTNTベースラインを提示します：1）ピラミッドアーキテクチャと2）畳み込みステム。新しい「PyramidTNT」は、階層表現を確立することにより、元のTNTを大幅に改善します。 PyramidTNTは、SwinTransformerなどの以前の最先端のビジョントランスフォーマーよりも優れたパフォーマンスを実現します。この新しいベースラインが、ビジョントランスフォーマーのさらなる研究と応用に役立つことを願っています。コードはhttps://github.com/huawei-noah/CV-Backbones/tree/master/tnt_pytorchで入手できます。

Transformer networks have achieved great progress for computer vision tasks. Transformer-in-Transformer (TNT) architecture utilizes inner transformer and outer transformer to extract both local and global representations. In this work, we present new TNT baselines by introducing two advanced designs: 1) pyramid architecture, and 2) convolutional stem. The new "PyramidTNT" significantly improves the original TNT by establishing hierarchical representations. PyramidTNT achieves better performances than the previous state-of-the-art vision transformers such as Swin Transformer. We hope this new baseline will be helpful to the further research and application of vision transformer. Code will be available at https://github.com/huawei-noah/CV-Backbones/tree/master/tnt_pytorch.

updated: Tue Jan 04 2022 04:56:57 GMT+0000 (UTC)

published: Tue Jan 04 2022 04:56:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト