Learning Disentangled Representation by Exploiting Pretrained Generative Models: A Contrastive Learning View

Xuanchi Ren; Tao Yang; Yuwang Wang; Wenjun Zeng

事前に訓練された生成モデルを利用することによる解きほぐされた表現の学習：対照的な学習ビュー

解きほぐしの直感的な概念から、さまざまな要因に対応する画像の変化は互いに区別されるべきであり、解きほぐされた表現はそれらの変化を別々の次元で反映する必要があります。要因を発見し、解きほぐされた表現を学習するために、以前の方法は通常、現実的な画像を生成することを学習するときに追加の正則化項を利用します。ただし、この用語は通常、解きほぐしと生成品質の間のトレードオフになります。解きほぐし項なしで事前トレーニングされた生成モデルの場合、生成された画像は、潜在空間内のさまざまな方向に沿って移動するときに意味的に意味のある変化を示します。この観察に基づいて、（i）事前にトレーニングされた生成品質の高い生成モデルを活用し、（ii）もつれを解いた表現学習の要因としてトラバース方向を発見することに焦点を当てることにより、トレードオフを軽減できると主張します。これを実現するために、ターゲットの解きほぐされた表現に基づいてバリエーションをモデル化し、バリエーションを対比して解きほぐされた方向を共同で発見し、解きほぐされた表現を学習するフレームワークとして、コントラストによる解きほぐし（DisCo）を提案します。 DisCoは、GAN、VAE、Flowなどの事前にトレーニングされた解きほぐされていない生成モデルを前提として、最先端の解きほぐされた表現学習と明確な方向発見を実現します。ソースコードはhttps://github.com/xrenaa/DisCoにあります。

From the intuitive notion of disentanglement, the image variations corresponding to different factors should be distinct from each other, and the disentangled representation should reflect those variations with separate dimensions. To discover the factors and learn disentangled representation, previous methods typically leverage an extra regularization term when learning to generate realistic images. However, the term usually results in a trade-off between disentanglement and generation quality. For the generative models pretrained without any disentanglement term, the generated images show semantically meaningful variations when traversing along different directions in the latent space. Based on this observation, we argue that it is possible to mitigate the trade-off by (i) leveraging the pretrained generative models with high generation quality, (ii) focusing on discovering the traversal directions as factors for disentangled representation learning. To achieve this, we propose Disentaglement via Contrast (DisCo) as a framework to model the variations based on the target disentangled representations, and contrast the variations to jointly discover disentangled directions and learn disentangled representations. DisCo achieves the state-of-the-art disentangled representation learning and distinct direction discovering, given pretrained non-disentangled generative models including GAN, VAE, and Flow. Source code is at https://github.com/xrenaa/DisCo.

updated: Mon Feb 14 2022 11:39:53 GMT+0000 (UTC)

published: Sun Feb 21 2021 08:01:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト