TcGAN: Semantic-Aware and Structure-Preserved GANs with Individual Vision Transformer for Fast Arbitrary One-Shot Image Generation

Yunliang Jiang; Lili Yan; Xiongtao Zhang; Yong Liu; Danfeng Sun

TcGAN: 任意のワンショット画像を高速に生成するための個別のビジョントランスフォーマーを備えたセマンティック認識および構造保存 GAN

特定の画像の内部パッチから学習する敵対的生成ネットワークを使用したワンショット画像生成 (OSG) は、世界中の注目を集めています。最近の研究では、学者は主に、純粋な畳み込みニューラルネットワーク (CNN) を使用して、確率的に分散された入力から画像の特徴を抽出することに焦点を当ててきました。ただし、受容ドメインが限られている CNN では、グローバルな構造情報を抽出して維持することは非常に困難です。したがって、この論文では、既存のワンショット画像生成方法の欠点を克服するために、個別のビジョントランスフォーマーを使用した新しい構造保存方法 TcGAN を提案します。具体的には、TcGAN は、トランスフォーマーの強力な長期依存関係モデリング機能を活用することで、セマンティック認識情報の整合性を維持しながら、トレーニング中に画像のグローバル構造を保持してローカルの詳細と互換性を保ちます。また、計算期間中にスケール不変性を持つ新しいスケーリング式を提案します。これにより、画像超解像タスクで生成された OSG モデルの画質が効果的に向上します。 TcGANコンバーターフレームワークの設計、包括的な実験研究、およびTcGANが最速の実行時間で任意の画像生成を達成する能力を実証するアブレーション研究を提示します。最後に、TcGAN は他の画像処理タスク (超解像や画像調和など) に適用するという点で最も優れたパフォーマンスを達成しており、結果はその優位性をさらに証明しています。

One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention. In recent studies, scholars have primarily focused on extracting features of images from probabilistically distributed inputs with pure convolutional neural networks (CNNs). However, it is quite difficult for CNNs with limited receptive domain to extract and maintain the global structural information. Therefore, in this paper, we propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods. Specifically, TcGAN preserves global structure of an image during training to be compatible with local details while maintaining the integrity of semantic-aware information by exploiting the powerful long-range dependencies modeling capability of the transformer. We also propose a new scaling formula having scale-invariance during the calculation period, which effectively improves the generated image quality of the OSG model on image super-resolution tasks. We present the design of the TcGAN converter framework, comprehensive experimental as well as ablation studies demonstrating the ability of TcGAN to achieve arbitrary image generation with the fastest running time. Lastly, TcGAN achieves the most excellent performance in terms of applying it to other image processing tasks, e.g., super-resolution as well as image harmonization, the results further prove its superiority.

updated: Thu Feb 16 2023 03:05:59 GMT+0000 (UTC)

published: Thu Feb 16 2023 03:05:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト