Progressive Energy-Based Cooperative Learning for Multi-Domain Image-to-Image Translation

Weinan Song; Yaxuan Zhu; Lei He; Yingnian Wu; Jianwen Xie

マルチドメインの画像から画像への変換のための先進的なエネルギーベースの協調学習

この論文では、マルチドメインの画像から画像への変換のための新しいエネルギーベースの協調学習フレームワークを研究します。このフレームワークは、ディスクリプタ、トランスレータ、スタイルエンコーダ、スタイルジェネレータの 4 つのコンポーネントで構成されます。記述子は、マルチドメイン画像分布を表すマルチヘッドエネルギーベースのモデルです。トランスレータ、スタイルエンコーダ、スタイルジェネレータのコンポーネントは、多様な画像ジェネレータを構成します。具体的には、ソースドメインからの入力画像が与えられると、トランスレータは、スタイルコードに従ってそれをターゲットドメインの様式化された出力画像に変換します。スタイルコードは、スタイルエンコーダによって参照画像から推論されるか、スタイルジェネレータによって参照画像から生成されます。ランダムなノイズ。スタイルジェネレーターはスタイルコードのドメイン固有の配布として表現されるため、トランスレーターはソースドメインとターゲットドメインの間で 1 対多の変換 (つまり、多様な生成) を提供できます。フレームワークをトレーニングするために、マルチドメイン MCMC ティーチングを介してマルチドメイン記述子と多様な画像ジェネレーター (トランスレーター、スタイルエンコーダー、スタイルジェネレーターモジュールを含む) を共同でトレーニングする尤度ベースのマルチドメイン協調学習アルゴリズムを提案します。これにより、記述子は、多様化画像ジェネレーターがその確率密度をデータ分布に向かってシフトするようにガイドし、一方、多様化画像ジェネレーターは、ランダムに変換された画像を使用して記述子のランジュバン力学プロセスを初期化し、効率的なサンプリングを実現します。

This paper studies a novel energy-based cooperative learning framework for multi-domain image-to-image translation. The framework consists of four components: descriptor, translator, style encoder, and style generator. The descriptor is a multi-head energy-based model that represents a multi-domain image distribution. The components of translator, style encoder, and style generator constitute a diversified image generator. Specifically, given an input image from a source domain, the translator turns it into a stylised output image of the target domain according to a style code, which can be inferred by the style encoder from a reference image or produced by the style generator from a random noise. Since the style generator is represented as an domain-specific distribution of style codes, the translator can provide a one-to-many transformation (i.e., diversified generation) between source domain and target domain. To train our framework, we propose a likelihood-based multi-domain cooperative learning algorithm to jointly train the multi-domain descriptor and the diversified image generator (including translator, style encoder, and style generator modules) via multi-domain MCMC teaching, in which the descriptor guides the diversified image generator to shift its probability density toward the data distribution, while the diversified image generator uses its randomly translated images to initialize the descriptor's Langevin dynamics process for efficient sampling.

updated: Mon Jan 15 2024 07:45:02 GMT+0000 (UTC)

published: Mon Jun 26 2023 06:34:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト