Scaling-up Disentanglement for Image Translation

Aviv Gabbay; Yedid Hoshen

画像翻訳のためのスケールアップ解きほぐし

画像変換方法は通常、ラベル付けされていない属性をそのままにして、ラベル付けされた属性のセットを操作することを目的としています（トレーニング時にドメインラベルなどの監視として与えられます）。現在の方法は、次のいずれかを実現します。（i）解きほぐし。これは、視覚的忠実度が低く、属性が完全に無相関である場合にのみ満たすことができます。（ii）視覚的に妥当な翻訳であり、明らかに解きほぐされていない。この作業では、ラベル付き属性とラベルなし属性を解きほぐし、2つの段階で構成される忠実度の高い画像を合成するための単一のフレームワークであるOverLORDを提案します。（i）解きほぐし：潜在最適化を使用して解きほぐされた表現を学習します。以前のアプローチとは異なり、敵対的なトレーニングやアーキテクチャ上のバイアスに依存していません。（ii）合成：学習した属性を推測し、知覚品質を向上させるために敵対的な方法でジェネレーターを調整するためのフィードフォワードエンコーダーのトレーニング。ラベル付き属性とラベルなし属性が相関している場合、相関属性を説明し、解きほぐしを改善する追加の表現をモデル化します。私たちの柔軟なフレームワークは、属性操作、ポーズ外観変換、セグメンテーションガイド合成、形状テクスチャ転送など、複数の画像変換設定をカバーしていることを強調します。広範な評価では、最先端の方法よりも高い翻訳品質と優れた出力の多様性を備えた、大幅に優れた解きほぐしを提示します。

Image translation methods typically aim to manipulate a set of labeled attributes (given as supervision at training time e.g. domain label) while leaving the unlabeled attributes intact. Current methods achieve either: (i) disentanglement, which exhibits low visual fidelity and can only be satisfied where the attributes are perfectly uncorrelated. (ii) visually-plausible translations, which are clearly not disentangled. In this work, we propose OverLORD, a single framework for disentangling labeled and unlabeled attributes as well as synthesizing high-fidelity images, which is composed of two stages; (i) Disentanglement: Learning disentangled representations with latent optimization. Differently from previous approaches, we do not rely on adversarial training or any architectural biases. (ii) Synthesis: Training feed-forward encoders for inferring the learned attributes and tuning the generator in an adversarial manner to increase the perceptual quality. When the labeled and unlabeled attributes are correlated, we model an additional representation that accounts for the correlated attributes and improves disentanglement. We highlight that our flexible framework covers multiple image translation settings e.g. attribute manipulation, pose-appearance translation, segmentation-guided synthesis and shape-texture transfer. In an extensive evaluation, we present significantly better disentanglement with higher translation quality and greater output diversity than state-of-the-art methods.

updated: Thu Mar 25 2021 17:52:38 GMT+0000 (UTC)

published: Thu Mar 25 2021 17:52:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト