The Transitive Information Theory and its Application to Deep Generative Models

Trung Ngo; Ville Hautamäki; Merja Heinäniemi

推移的情報理論とその深層生成モデルへの応用

逆説的ですが、変分オートエンコーダー（VAE）は、強力なデコーダーモデルを利用してリアルな画像を生成し、学習した表現を折りたたむか、正則化係数を増やして表現を解きほぐし、最終的にぼやけた例を生成することで、2つの反対方向にプッシュできます。既存の方法では、問題を圧縮と再構築の間のレート歪みのトレードオフに絞り込みます。優れた再構成モデルは、より詳細な情報をエンコードする大容量の潜在性を学習すると主張しますが、その使用は2つの主要な問題によって妨げられます。平均場変分推論は、これらのユニットをもっともらしい新しい出力に再結合するタスクを実行不可能にする階層構造を強制しません。その結果、解きほぐされた表現の階層を学習するシステムと、学習した表現を一般化のために再結合するメカニズムを開発します。これは、VAEの事前に制御可能なものを学習するために、最小限の誘導バイアスを導入することによって実現されます。このアイデアは、ここで開発された遷移情報理論によってサポートされています。つまり、2つのターゲット変数間の相互情報量は、相互情報量を介して3番目の変数に交互に最大化できるため、VAE設計のレート歪みボトルネックを回避できます。特に、SemafoVAE（コンピューターサイエンスの同様の概念に触発された）という名前のモデルが、制御可能な方法で高品質の例を生成し、解きほぐされた要素のスムーズなトラバースと、異なるレベルの表現階層での介入を実行できることを示します。

Paradoxically, a Variational Autoencoder (VAE) could be pushed in two opposite directions, utilizing powerful decoder model for generating realistic images but collapsing the learned representation, or increasing regularization coefficient for disentangling representation but ultimately generating blurry examples. Existing methods narrow the issues to the rate-distortion trade-off between compression and reconstruction. We argue that a good reconstruction model does learn high capacity latents that encode more details, however, its use is hindered by two major issues: the prior is random noise which is completely detached from the posterior and allow no controllability in the generation; mean-field variational inference doesn't enforce hierarchy structure which makes the task of recombining those units into plausible novel output infeasible. As a result, we develop a system that learns a hierarchy of disentangled representation together with a mechanism for recombining the learned representation for generalization. This is achieved by introducing a minimal amount of inductive bias to learn controllable prior for the VAE. The idea is supported by here developed transitive information theory, that is, the mutual information between two target variables could alternately be maximized through the mutual information to the third variable, thus bypassing the rate-distortion bottleneck in VAE design. In particular, we show that our model, named SemafoVAE (inspired by the similar concept in computer science), could generate high-quality examples in a controllable manner, perform smooth traversals of the disentangled factors and intervention at a different level of representation hierarchy.

updated: Wed Mar 09 2022 22:35:02 GMT+0000 (UTC)

published: Wed Mar 09 2022 22:35:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト