Master: Meta Style Transformer for Controllable Zero-Shot and Few-Shot Artistic Style Transfer

Hao Tang; Songhua Liu; Tianwei Lin; Shaoli Huang; Fu Li; Dongliang He; Xinchao Wang

マスター: 制御可能なゼロショットおよび少数ショットの芸術的なスタイル転送のためのメタスタイルトランスフォーマー

トランスフォーマーベースのモデルは、そのグローバルな受容フィールドと強力なマルチヘッド/レイヤーアテンション操作のおかげで、最近、芸術的なスタイルのトランスファーで好成績を収めています.それにもかかわらず、過度にパラメーター化された多層構造はパラメーターを大幅に増加させるため、トレーニングに大きな負担がかかります。さらに、スタイル転送のタスクについては、残余の接続によってコンテンツとスタイルの機能を融合する標準的な Transformer は、コンテンツに関する歪みを起こしやすいです。このホワイトペーパーでは、スタイル転送専用のマスターと呼ばれる新しい Transformer モデルを考案します。一方では、提案されたモデルでは、異なる Transformer レイヤーが共通のパラメーターグループを共有します。これにより、(1) パラメーターの総数が減少し、(2) より堅牢なトレーニングの収束につながり、(3) パラメーターを容易に制御できます。推論中にスタックレイヤーの数を自由に調整することによる様式化の程度。一方、バニラバージョンとは異なり、コンテンツスタイル機能の相互作用の前に、コンテンツ機能に対して学習可能なスケーリング操作を採用しています。また、提案されたモデルの新しいメタ学習スキームを提案し、任意のスタイル転送の典型的な設定で機能するだけでなく、少数の Transformer エンコーダーレイヤーを微調整するだけで、少数ショットの設定にも適応できるようにします。 - 1 つの特定のスタイルのショットステージ。提案されたフレームワークを使用して、テキストガイド付きの少数ショットスタイルの転送が最初に達成されます。広範な実験により、ゼロショットおよび少数ショットスタイルの転送設定の両方でマスターの優位性が実証されています。

Transformer-based models achieve favorable performance in artistic style transfer recently thanks to its global receptive field and powerful multi-head/layer attention operations. Nevertheless, the over-paramerized multi-layer structure increases parameters significantly and thus presents a heavy burden for training. Moreover, for the task of style transfer, vanilla Transformer that fuses content and style features by residual connections is prone to content-wise distortion. In this paper, we devise a novel Transformer model termed as Master specifically for style transfer. On the one hand, in the proposed model, different Transformer layers share a common group of parameters, which (1) reduces the total number of parameters, (2) leads to more robust training convergence, and (3) is readily to control the degree of stylization via tuning the number of stacked layers freely during inference. On the other hand, different from the vanilla version, we adopt a learnable scaling operation on content features before content-style feature interaction, which better preserves the original similarity between a pair of content features while ensuring the stylization quality. We also propose a novel meta learning scheme for the proposed model so that it can not only work in the typical setting of arbitrary style transfer, but also adaptable to the few-shot setting, by only fine-tuning the Transformer encoder layer in the few-shot stage for one specific style. Text-guided few-shot style transfer is firstly achieved with the proposed framework. Extensive experiments demonstrate the superiority of Master under both zero-shot and few-shot style transfer settings.

updated: Mon Apr 24 2023 04:46:39 GMT+0000 (UTC)

published: Mon Apr 24 2023 04:46:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト