Network-to-Network Translation with Conditional Invertible Neural Networks

Robin Rombach; Patrick Esser; Björn Ommer

条件付き可逆ニューラルネットワークを使用したネットワークからネットワークへの変換

現代の機械学習モデルの計算コストが増え続けることを考えると、そのようなエキスパートモデルを再利用して、その作成に投資されたリソースを活用する新しい方法を見つける必要があります。最近の研究は、これらの大規模なモデルの力が、彼らが学んだ表現によって捉えられていることを示唆しています。したがって、異なる既存の表現間で関連付けることができるモデルを探し、条件付きで可逆なネットワークを使用してこのタスクを解決することを提案します。このネットワークは、（i）多様なドメイン間の一般的な転送を提供し、（ii）他のドメインでの変更を許可することで制御されたコンテンツ合成を可能にし、（iii）既存の表現を画像などの解釈可能なドメインに変換することで診断を容易にすることでその機能を示します。私たちのドメイン転送ネットワークは、固定表現を学習したり微調整したりすることなく、それらの間で変換できます。これにより、ユーザーは、広範な計算リソースでトレーニングされた文献から、さまざまな既存のドメイン固有のエキスパートモデルを利用できます。多様な条件付き画像合成タスクに関する実験、競合する画像修正結果、および画像から画像への生成とテキストから画像への生成に関する実験は、私たちのアプローチの一般的な適用可能性を示しています。たとえば、BERTとBigGAN、最先端のテキストモデルと画像モデルを変換して、テキストから画像への生成を提供します。どちらの専門家も単独では実行できません。

Given the ever-increasing computational costs of modern machine learning models, we need to find new ways to reuse such expert models and thus tap into the resources that have been invested in their creation. Recent work suggests that the power of these massive models is captured by the representations they learn. Therefore, we seek a model that can relate between different existing representations and propose to solve this task with a conditionally invertible network. This network demonstrates its capability by (i) providing generic transfer between diverse domains, (ii) enabling controlled content synthesis by allowing modification in other domains, and (iii) facilitating diagnosis of existing representations by translating them into interpretable domains such as images. Our domain transfer network can translate between fixed representations without having to learn or finetune them. This allows users to utilize various existing domain-specific expert models from the literature that had been trained with extensive computational resources. Experiments on diverse conditional image synthesis tasks, competitive image modification results and experiments on image-to-image and text-to-image generation demonstrate the generic applicability of our approach. For example, we translate between BERT and BigGAN, state-of-the-art text and image models to provide text-to-image generation, which neither of both experts can perform on their own.

updated: Mon Nov 09 2020 20:34:36 GMT+0000 (UTC)

published: Wed May 27 2020 18:14:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト