Detail-Preserving Transformer for Light Field Image Super-Resolution

Shunzhou Wang; Tianfei Zhou; Yao Lu; Huijun Di

ライトフィールド画像超解像用の詳細保存トランス

最近、ライトフィールド超解像（LFSR）の問題に取り組むために、多くのアルゴリズムが開発されました。つまり、低解像度のライトフィールドを超解像して高解像度のビューを取得します。有望な結果を提供するにもかかわらず、これらのアプローチはすべて畳み込みベースであり、ライトフィールドの固有の構造を特徴づけるために必然的にサブアパーチャ画像のグローバルリレーションモデリングでは当然弱くなります。この論文では、LFSRをシーケンス間の再構築タスクとして扱うことにより、トランスフォーマーに基づいて構築された新しい定式化を提示します。特に、私たちのモデルは、各垂直または水平角度ビューのサブアパーチャ画像をシーケンスと見なし、それぞれの局所性を維持する空間角度の局所的に強化された自己注意層を介して、各シーケンス内の長距離の幾何学的依存関係を確立しますサブアパーチャ画像も同様です。さらに、画像の詳細をより適切に復元するために、ライトフィールドの勾配マップを利用してシーケンス学習をガイドすることにより、詳細を保持するトランスフォーマー（DPTと呼ばれる）を提案します。 DPTは2つのブランチで構成され、それぞれが元の画像シーケンスまたはグラデーション画像シーケンスから学習するためのトランスフォーマーに関連付けられています。 2つのブランチは最終的に融合され、再構築のための包括的な機能表現を取得します。評価は、実世界のシーンや合成データなど、多数のライトフィールドデータセットに対して実行されます。提案された方法は、他の最先端の方式と比較して優れた性能を達成します。私たちのコードはhttps://github.com/BITszwang/DPTで公開されています。

Recently, numerous algorithms have been developed to tackle the problem of light field super-resolution (LFSR), i.e., super-resolving low-resolution light fields to gain high-resolution views. Despite delivering encouraging results, these approaches are all convolution-based, and are naturally weak in global relation modeling of sub-aperture images necessarily to characterize the inherent structure of light fields. In this paper, we put forth a novel formulation built upon Transformers, by treating LFSR as a sequence-to-sequence reconstruction task. In particular, our model regards sub-aperture images of each vertical or horizontal angular view as a sequence, and establishes long-range geometric dependencies within each sequence via a spatial-angular locally-enhanced self-attention layer, which maintains the locality of each sub-aperture image as well. Additionally, to better recover image details, we propose a detail-preserving Transformer (termed as DPT), by leveraging gradient maps of light field to guide the sequence learning. DPT consists of two branches, with each associated with a Transformer for learning from an original or gradient image sequence. The two branches are finally fused to obtain comprehensive feature representations for reconstruction. Evaluations are conducted on a number of light field datasets, including real-world scenes and synthetic data. The proposed method achieves superior performance comparing with other state-of-the-art schemes. Our code is publicly available at: https://github.com/BITszwang/DPT.

updated: Sun Jan 02 2022 12:33:23 GMT+0000 (UTC)

published: Sun Jan 02 2022 12:33:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト