DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Songhua Liu; Jingwen Ye; Sucheng Ren; Xinchao Wang

DynaST：模範誘導イメージ生成のための動的スパーストランスフォーマー

模範誘導イメージ生成の重要な課題の1つは、入力画像と誘導イメージの間のきめ細かい対応を確立することです。以前のアプローチは、有望な結果にもかかわらず、二次メモリコストのために粗いスケールのみに制限されるポイントごとのマッチングを計算するために密な注意を推定するか、柔軟性に欠ける線形の複雑さを達成するために対応の数を固定することに依存していました。この論文では、動的スパース注意ベースのトランスフォーマーモデルを提案します。これは、ダイナミックスパーストランスフォーマー（DynaST）と呼ばれ、良好な効率でファインレベルのマッチングを実現します。私たちのアプローチの中心は、1つのポジションが焦点を当てるべきトークンの最適な数の変動をカバーすることに専念する、新しい動的注意ユニットです。具体的には、DynaSTは、Transformer構造の多層性を活用し、動的アテンションスキームをカスケード方式で実行して、マッチング結果を改善し、視覚的に心地よい出力を合成します。さらに、DynaSTの統一されたトレーニング目標を導入し、教師ありシナリオと教師なしシナリオの両方に対応する多用途の参照ベースの画像変換フレームワークにします。ポーズガイドによる人物画像の生成、エッジベースの顔合成、歪みのない画像スタイルの転送という3つのアプリケーションに関する広範な実験により、DynaSTはローカルの詳細で優れたパフォーマンスを実現し、計算コストを大幅に削減しながら最先端のパフォーマンスを発揮することが実証されています。私たちのコードはhttps://github.com/Huage001/DynaSTで入手できます

One key challenge of exemplar-guided image generation lies in establishing fine-grained correspondences between input and guided images. Prior approaches, despite the promising results, have relied on either estimating dense attention to compute per-point matching, which is limited to only coarse scales due to the quadratic memory cost, or fixing the number of correspondences to achieve linear complexity, which lacks flexibility. In this paper, we propose a dynamic sparse attention based Transformer model, termed Dynamic Sparse Transformer (DynaST), to achieve fine-level matching with favorable efficiency. The heart of our approach is a novel dynamic-attention unit, dedicated to covering the variation on the optimal number of tokens one position should focus on. Specifically, DynaST leverages the multi-layer nature of Transformer structure, and performs the dynamic attention scheme in a cascaded manner to refine matching results and synthesize visually-pleasing outputs. In addition, we introduce a unified training objective for DynaST, making it a versatile reference-based image translation framework for both supervised and unsupervised scenarios. Extensive experiments on three applications, pose-guided person image generation, edge-based face synthesis, and undistorted image style transfer, demonstrate that DynaST achieves superior performance in local details, outperforming the state of the art while reducing the computational cost significantly. Our code is available at https://github.com/Huage001/DynaST

updated: Mon Mar 27 2023 07:55:32 GMT+0000 (UTC)

published: Wed Jul 13 2022 11:12:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト