Sliced Recursive Transformer

Zhiqiang Shen; Zechun Liu; Eric Xing

スライスされた再帰トランス

追加のパラメーターを使用せずにパラメーターの使用率を向上させることができる、ビジョントランスフォーマーのきちんとした効果的な再帰操作を紹介します。これは、トランスネットワークの深さ全体で重みを共有することによって実現されます。提案された方法は、単純な再帰操作を使用するだけで実質的なゲイン（〜2％）を得ることができ、ネットワークの原理を設計するための特別な知識や高度な知識を必要とせず、トレーニング手順に最小限の計算オーバーヘッドを導入します。優れた精度を維持しながら再帰操作による追加の計算を減らすために、再帰レイヤー全体で複数のスライスされたグループの自己注意による近似方法を提案します。これにより、パフォーマンスの低下を最小限に抑えながら、コスト消費を10〜30％削減できます。モデルをSlicedRecursive Transformer（SReT）と呼びます。これは、効率的なビジョントランスのための他のさまざまな設計と互換性があります。私たちの最良のモデルは、より少ないパラメーターを含みながら、最先端の方法よりもImageNetの大幅な改善を確立します。提案されたスライス再帰演算により、モデルサイズが大きすぎる場合の最適化の問題を回避するために、100層または1000層を超えるトランスをさらに小さなサイズ（13〜15M）で簡単に構築できます。柔軟なスケーラビリティは、非常に深くて大きな次元のビジョントランスフォーマーをスケールアップして構築するための大きな可能性を示しています。コードとモデルはhttps://github.com/szq0214/SReTで入手できます。

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using naïve recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimum computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), which is compatible with a broad range of other designs for efficient vision transformers. Our best model establishes significant improvement on ImageNet over state-of-the-art methods while containing fewer parameters. The proposed sliced recursive operation allows us to build a transformer with more than 100 or even 1000 layers effortlessly under a still small size (13~15M), to avoid difficulties in optimization when the model size is too large. The flexible scalability has shown great potential for scaling up and constructing extremely deep and large dimensionality vision transformers. Our code and models are available at https://github.com/szq0214/SReT.

updated: Tue Nov 09 2021 17:59:14 GMT+0000 (UTC)

published: Tue Nov 09 2021 17:59:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト