From Coarse to Fine: Hierarchical Pixel Integration for Lightweight Image Super-Resolution

Jie Liu; Chao Chen; Jie Tang; Gangshan Wu

粗いものから細かいものまで: 軽量画像超解像度のための階層的ピクセル統合

画像の超解像 (SR) は、マルチメディアデータの処理と送信の基本的なツールとして機能します。最近、Transformer ベースのモデルは、イメージ SR で競争力のあるパフォーマンスを達成しています。画像を固定サイズのパッチに分割し、これらのパッチに自己注意を適用して、ピクセル間の長期的な依存関係をモデル化します。ただし、このアーキテクチャ設計は高レベルのビジョンタスク用に作成されたものであり、SR の知識からの設計ガイドラインが欠けています。この論文では、SRネットワークのローカルアトリビューションマップ（LAM）の解釈から洞察を得た新しいアテンションブロックを設計することを目指しています。具体的には、LAM は、最も重要なピクセルがパッチの細かい領域に配置され、重要度の低いピクセルが画像全体の粗い領域に分散している階層的な重要度マップを提示します。粗い領域のピクセルにアクセスするために、非常に大きなパッチサイズを使用する代わりに、画像内の最も類似したパッチで相互注意を適用する軽量のグローバルピクセルアクセス (GPA) モジュールを提案します。細かい領域では、Intra-Patch Self-Attention (IPSA) モジュールを使用して、ローカルパッチ内の長距離ピクセルの依存関係をモデル化し、3×3 畳み込みを適用して細部を処理します。さらに、カスケードパッチ分割 (CPD) 戦略は、復元された画像の知覚品質を向上させるために提案されています。広範な実験により、私たちの方法が最先端の軽量SR方法よりも大幅に優れていることが示唆されています。コードは https://github.com/passerer/HPINet で入手できます。

Image super-resolution (SR) serves as a fundamental tool for the processing and transmission of multimedia data. Recently, Transformer-based models have achieved competitive performances in image SR. They divide images into fixed-size patches and apply self-attention on these patches to model long-range dependencies among pixels. However, this architecture design is originated for high-level vision tasks, which lacks design guideline from SR knowledge. In this paper, we aim to design a new attention block whose insights are from the interpretation of Local Attribution Map (LAM) for SR networks. Specifically, LAM presents a hierarchical importance map where the most important pixels are located in a fine area of a patch and some less important pixels are spread in a coarse area of the whole image. To access pixels in the coarse area, instead of using a very large patch size, we propose a lightweight Global Pixel Access (GPA) module that applies cross-attention with the most similar patch in an image. In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a 3×3 convolution is applied to process the finest details. In addition, a Cascaded Patch Division (CPD) strategy is proposed to enhance perceptual quality of recovered images. Extensive experiments suggest that our method outperforms state-of-the-art lightweight SR methods by a large margin. Code is available at https://github.com/passerer/HPINet.

updated: Wed Nov 30 2022 06:32:34 GMT+0000 (UTC)

published: Wed Nov 30 2022 06:32:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト