Activating More Pixels in Image Super-Resolution Transformer

Xiangyu Chen; Xintao Wang; Jiantao Zhou; Yu Qiao; Chao Dong

画像超解像トランスでより多くのピクセルを有効化

Transformer ベースの方法は、画像の超解像などの低レベルのビジョンタスクで優れたパフォーマンスを示しています。ただし、これらのネットワークは、属性分析を通じて限られた空間範囲の入力情報しか利用できないことがわかりました。これは、Transformer の可能性がまだ既存のネットワークで十分に活用されていないことを意味します。再構成を改善するために、より多くの入力ピクセルをアクティブにするために、新しいハイブリッドアテンショントランスフォーマー (HAT) を提案します。チャネルアテンションとウィンドウベースのセルフアテンションスキームの両方を組み合わせて、グローバル統計と強力なローカルフィッティング機能を利用できるという補完的な利点を利用します。さらに、クロスウィンドウ情報をより適切に集約するために、重複するクロスアテンションモジュールを導入して、隣接するウィンドウ機能間の相互作用を強化します。トレーニング段階では、同じタスクの事前トレーニング戦略をさらに採用して、モデルの可能性をさらに改善するために活用します。広範な実験により、提案されたモジュールの有効性が示され、モデルをさらにスケールアップして、このタスクのパフォーマンスを大幅に改善できることを実証します。私たちの全体的な方法は、最先端の方法よりも1dB以上大幅に優れています。コードとモデルは、https://github.com/XPixelGroup/HAT で入手できます。

Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for better reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines both channel attention and window-based self-attention schemes, thus making use of their complementary advantages of being able to utilize global statistics and strong local fitting capability. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement. Extensive experiments show the effectiveness of the proposed modules, and we further scale up the model to demonstrate that the performance of this task can be greatly improved. Our overall method significantly outperforms the state-of-the-art methods by more than 1dB. Codes and models are available at https://github.com/XPixelGroup/HAT.

updated: Sun Mar 19 2023 01:25:49 GMT+0000 (UTC)

published: Mon May 09 2022 17:36:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト