UniNeXt: Exploring A Unified Architecture for Vision Recognition

Fangjian Lin; Jianlong Yuan; Sitong Wu; Fan Wang; Zhibin Wang

UniNeXt: 視覚認識のための統合アーキテクチャの探索

ビジョントランスフォーマーは、コンピュータービジョンタスクで大きな可能性を示しています。最近の作業のほとんどは、パフォーマンス向上のための空間トークンミキサーの作成に重点を置いています。ただし、適切に設計された一般的なアーキテクチャは、どの空間トークンミキサーが装備されているかに関係なく、バックボーン全体のパフォーマンスを大幅に向上させることができます。この論文では、ビジョンバックボーンの改良された一般的なアーキテクチャである UniNeXt を提案します。その有効性を検証するために、畳み込みモジュールと注意モジュールの両方を含む、さまざまな典型的および最新の設計で空間トークンミキサーをインスタンス化します。それらが最初に提案されたアーキテクチャと比較して、UniNeXt アーキテクチャは、すべての空間トークンミキサーのパフォーマンスを着実に向上させ、それらの間のパフォーマンスギャップを狭めることができます。驚くべきことに、単純なローカルウィンドウアテンションを備えた UniNeXt は、以前の最先端技術よりも優れています。興味深いことに、これらの空間トークンミキサーのランキングも UniNeXt の下で変化します。これは、優れた空間トークンミキサーが最適ではない一般的なアーキテクチャのために抑制される可能性があることを示唆しています。これは、ビジョンバックボーンの一般的なアーキテクチャに関する研究の重要性をさらに示しています。すべてのモデルとコードは公開されます。

Vision Transformers have shown great potential in computer vision tasks. Most recent works have focused on elaborating the spatial token mixer for performance gains. However, we observe that a well-designed general architecture can significantly improve the performance of the entire backbone, regardless of which spatial token mixer is equipped. In this paper, we propose UniNeXt, an improved general architecture for the vision backbone. To verify its effectiveness, we instantiate the spatial token mixer with various typical and modern designs, including both convolution and attention modules. Compared with the architecture in which they are first proposed, our UniNeXt architecture can steadily boost the performance of all the spatial token mixers, and narrows the performance gap among them. Surprisingly, our UniNeXt equipped with naive local window attention even outperforms the previous state-of-the-art. Interestingly, the ranking of these spatial token mixers also changes under our UniNeXt, suggesting that an excellent spatial token mixer may be stifled due to a suboptimal general architecture, which further shows the importance of the study on the general architecture of vision backbone. All models and codes will be publicly available.

updated: Wed Apr 26 2023 17:28:09 GMT+0000 (UTC)

published: Wed Apr 26 2023 17:28:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト