LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

Jihye Park; Soohyun Kim; Sunwoo Kim; Jaejun Yoo; Youngjung Uh; Seungryong Kim

LANIT: ラベル付けされていないデータの言語主導の画像から画像への変換

画像から画像への変換のための既存の技術は、通常、2 つの重大な問題に悩まされてきました。サンプルごとのドメインアノテーションへの依存度が高いこと、および/または画像ごとに複数の属性を処理できないことです。最近の方法では、教師なしでサンプルごとの注釈を簡単に提供するためにクラスタリングアプローチが採用されています。ただし、現実世界の設定を説明することはできません。 1 つのサンプルに複数の属性がある場合があります。さらに、クラスターのセマンティクスは、人間の理解と簡単に結びつけることができません。これらを克服するために、LANIT と呼ばれる LANguage 主導の画像から画像への変換モデルを提示します。データセットのテキストで指定された取得しやすい候補ドメインアノテーションを活用し、トレーニング中にそれらを共同で最適化します。ターゲットスタイルは、マルチホットドメインの割り当てに従ってマルチドメインスタイルベクトルを集約することによって指定されます。最初の候補ドメインテキストは不正確である可能性があるため、候補ドメインテキストを学習可能に設定し、トレーニング中に共同で微調整します。さらに、候補ドメインでカバーされていないサンプルをカバーするために、slack ドメインを導入します。いくつかの標準的なベンチマークでの実験は、LANIT が既存のモデルと同等またはそれ以上のパフォーマンスを達成することを示しています。

Existing techniques for image-to-image translation commonly have suffered from two critical problems: heavy reliance on per-sample domain annotation and/or inability of handling multiple attributes per image. Recent methods adopt clustering approaches to easily provide per-sample annotations in an unsupervised manner. However, they cannot account for the real-world setting; one sample may have multiple attributes. In addition, the semantics of the clusters are not easily coupled to human understanding. To overcome these, we present a LANguage-driven Image-to-image Translation model, dubbed LANIT. We leverage easy-to-obtain candidate domain annotations given in texts for a dataset and jointly optimize them during training. The target style is specified by aggregating multi-domain style vectors according to the multi-hot domain assignments. As the initial candidate domain texts might be inaccurate, we set the candidate domain texts to be learnable and jointly fine-tune them during training. Furthermore, we introduce a slack domain to cover samples that are not covered by the candidate domains. Experiments on several standard benchmarks demonstrate that LANIT achieves comparable or superior performance to the existing model.

updated: Wed Aug 31 2022 14:30:00 GMT+0000 (UTC)

published: Wed Aug 31 2022 14:30:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト