CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution

Guangwei Gao; Zixiang Xu; Juncheng Li; Jian Yang; Tieyong Zeng; Guo-Jun Qi

CTCNet: 顔画像超解像のための CNN-Transformer 連携ネットワーク

最近、深層畳み込みニューラルネットワーク (CNN) で操作された顔の超解像手法は、顔の事前分布と共同でトレーニングすることにより、劣化した顔の詳細を復元する上で大きな進歩を遂げました。ただし、これらの方法にはいくつかの明らかな制限があります。一方では、マルチタスクの共同学習では、データセットに追加のマーキングが必要であり、導入された以前のネットワークにより、モデルの計算コストが大幅に増加します。一方、CNN の限られた受容野は、再構成された顔画像の忠実度と自然さを低下させ、最適ではない再構成画像をもたらします。この作業では、バックボーンとしてマルチスケール接続エンコーダーデコーダーアーキテクチャを使用する、顔の超解像タスクのための効率的な CNN-Transformer Cooperation Network (CTCNet) を提案します。具体的には、ローカルの顔の詳細とグローバルな顔の構造の復元の一貫性を同時に促進するために、Facial Structure Attention Unit (FSAU) と Transformer ブロックで構成される新しい Local-Global Feature Cooperation Module (LGCM) を最初に考案しました。次に、効率的な特徴改良モジュール (FRM) を設計して、エンコードされた特徴を強化します。最後に、顔の細部の復元をさらに改善するために、マルチスケール機能融合ユニット (MFFU) を提示して、エンコーダー手順のさまざまな段階から機能を適応的に融合します。さまざまなデータセットに対する広範な評価により、提案された CTCNet が他の最先端の方法よりも大幅に優れていることが評価されました。ソースコードは、https://github.com/IVIPLab/CTCNet で入手できます。

Recently, deep convolution neural networks (CNNs) steered face super-resolution methods have achieved great progress in restoring degraded facial details by jointly training with facial priors. However, these methods have some obvious limitations. On the one hand, multi-task joint learning requires additional marking on the dataset, and the introduced prior network will significantly increase the computational cost of the model. On the other hand, the limited receptive field of CNN will reduce the fidelity and naturalness of the reconstructed facial images, resulting in suboptimal reconstructed images. In this work, we propose an efficient CNN-Transformer Cooperation Network (CTCNet) for face super-resolution tasks, which uses the multi-scale connected encoder-decoder architecture as the backbone. Specifically, we first devise a novel Local-Global Feature Cooperation Module (LGCM), which is composed of a Facial Structure Attention Unit (FSAU) and a Transformer block, to promote the consistency of local facial detail and global facial structure restoration simultaneously. Then, we design an efficient Feature Refinement Module (FRM) to enhance the encoded features. Finally, to further improve the restoration of fine facial details, we present a Multi-scale Feature Fusion Unit (MFFU) to adaptively fuse the features from different stages in the encoder procedure. Extensive evaluations on various datasets have assessed that the proposed CTCNet can outperform other state-of-the-art methods significantly. Source code will be available at https://github.com/IVIPLab/CTCNet.

updated: Thu Mar 23 2023 09:44:22 GMT+0000 (UTC)

published: Tue Apr 19 2022 06:38:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト