Scaling-up Disentanglement for Image Translation

Aviv Gabbay; Yedid Hoshen

画像翻訳のためのスケールアップ解きほぐし

画像変換方法は通常、ラベル付けされていない属性をそのままにして、ラベル付けされた属性のセットを操作することを目的としています（トレーニング時にドメインラベルなどの監視として与えられます）。現在の方法は、次のいずれかを実現します。（i）解きほぐし。これは、視覚的忠実度が低く、属性が完全に相関していない場合にのみ満たすことができます。（ii）視覚的に妥当な翻訳であり、明らかに解きほぐされていない。この作業では、ラベル付き属性とラベルなし属性を解きほぐし、2つの段階で構成される忠実度の高い画像を合成するための単一のフレームワークであるOverLORDを提案します。（i）解きほぐし：潜在最適化を使用して解きほぐされた表現を学習します。以前のアプローチとは異なり、敵対的なトレーニングやアーキテクチャ上のバイアスに依存していません。（ii）合成：学習した属性を推測し、知覚品質を向上させるために敵対的な方法でジェネレーターを調整するためのフィードフォワードエンコーダーのトレーニング。ラベル付き属性とラベルなし属性が相関している場合、相関属性を説明し、解きほぐしを改善する追加の表現をモデル化します。柔軟なフレームワークは、ラベル付けされた属性、ポーズと外観、ローカライズされた概念、形状とテクスチャを解きほぐすなど、複数の設定をカバーしていることを強調します。最先端の方法よりも、翻訳品質が高く、出力の多様性が高いため、解きほぐしが大幅に向上します。

Image translation methods typically aim to manipulate a set of labeled attributes (given as supervision at training time e.g. domain label) while leaving the unlabeled attributes intact. Current methods achieve either: (i) disentanglement, which exhibits low visual fidelity and can only be satisfied where the attributes are perfectly uncorrelated. (ii) visually-plausible translations, which are clearly not disentangled. In this work, we propose OverLORD, a single framework for disentangling labeled and unlabeled attributes as well as synthesizing high-fidelity images, which is composed of two stages; (i) Disentanglement: Learning disentangled representations with latent optimization. Differently from previous approaches, we do not rely on adversarial training or any architectural biases. (ii) Synthesis: Training feed-forward encoders for inferring the learned attributes and tuning the generator in an adversarial manner to increase the perceptual quality. When the labeled and unlabeled attributes are correlated, we model an additional representation that accounts for the correlated attributes and improves disentanglement. We highlight that our flexible framework covers multiple settings as disentangling labeled attributes, pose and appearance, localized concepts, and shape and texture. We present significantly better disentanglement with higher translation quality and greater output diversity than state-of-the-art methods.

updated: Wed Sep 08 2021 07:06:33 GMT+0000 (UTC)

published: Thu Mar 25 2021 17:52:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト