Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation

Qiusheng Huang; Zhilin Zheng; Xueqi Hu; Li Sun; Qingli Li

多属性画像から画像への変換におけるラベルベースの合成と参照ベースの合成の間のギャップを埋める

画像から画像への変換（I2IT）モデルは、ターゲットラベルまたは参照画像を入力として受け取り、ソースを指定されたターゲットドメインに変更します。ラベルベースまたは参照ベースの2種類の合成には、大きな違いがあります。特に、ラベルベースの合成はターゲットドメインの共通の特性を反映し、参照ベースは参照と同様の特定のスタイルを示します。このホワイトペーパーでは、多属性I2ITのタスクでそれらの間のギャップを埋めることを目的としています。ドメインの違いを比較するために、ラベルベースとリファレンスベースのエンコーディングモジュール（LEMとREM）を設計します。最初に、属性差ベクトルを介して反対方向を提供することにより、ソース画像とターゲットラベル（または参照）を共通の埋め込みスペースに転送します。次に、2つの埋め込みが単純に融合されて、潜在コードS_rand（またはS_ref）が形成されます。これは、ドメインスタイルの違いを反映しており、SPADEによってジェネレーターの各レイヤーに挿入されます。 LEMとREMをリンクして、2つのタイプの結果が互いに利益をもたらすように、2つの潜在コードを近づけて、それらの順方向変換と逆方向変換の間にサイクルの一貫性を設定することをお勧めします。さらに、S_randとS_refの間の補間は、追加の画像を合成するためにも使用されます。実験によると、ラベルベースの合成と参照ベースの合成は実際に相互に促進されているため、LEMからさまざまな結果が得られ、同様のスタイルの参照で高品質の結果が得られます。

The image-to-image translation (I2IT) model takes a target label or a reference image as the input, and changes a source into the specified target domain. The two types of synthesis, either label- or reference-based, have substantial differences. Particularly, the label-based synthesis reflects the common characteristics of the target domain, and the reference-based shows the specific style similar to the reference. This paper intends to bridge the gap between them in the task of multi-attribute I2IT. We design the label- and reference-based encoding modules (LEM and REM) to compare the domain differences. They first transfer the source image and target label (or reference) into a common embedding space, by providing the opposite directions through the attribute difference vector. Then the two embeddings are simply fused together to form the latent code S_rand (or S_ref), reflecting the domain style differences, which is injected into each layer of the generator by SPADE. To link LEM and REM, so that two types of results benefit each other, we encourage the two latent codes to be close, and set up the cycle consistency between the forward and backward translations on them. Moreover, the interpolation between the S_rand and S_ref is also used to synthesize an extra image. Experiments show that label- and reference-based synthesis are indeed mutually promoted, so that we can have the diverse results from LEM, and high quality results with the similar style of the reference.

updated: Mon Oct 11 2021 07:48:09 GMT+0000 (UTC)

published: Mon Oct 11 2021 07:48:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト