Multi-crop Contrastive Learning and Domain Consistency for Unsupervised Image-to-Image Translation

Chen Zhao; Wei-Ling Cai; Zheng Yuan; Cheng-Wei Hu

教師なし画像から画像への変換のためのマルチクロップ対照学習とドメインの一貫性

最近、対照学習に基づく教師なし画像間変換手法が、多くのタスクで最先端の結果を達成しました。ただし、以前の研究では、ネガは入力画像自体からサンプリングされていたため、選択したネガの品質を向上させるデータ拡張方法を設計するようになりました。さらに、埋め込み空間でのパッチごとの対比学習によってコンテンツの類似性を維持するため、以前の方法では、生成された画像とターゲットドメインの実際の画像の間のドメインの一貫性が無視されます。この論文では、MCDUT と呼ばれる、マルチクロップ対比学習とドメイン一貫性に基づいた新しい教師なし画像間変換フレームワークを提案します。具体的には、センタークロップとランダムクロップを介してマルチクロップビューを取得してネガを生成し、ネガの品質を向上させることができます。深い特徴空間での埋め込みを制約するために、新しいドメイン一貫性損失を定式化します。これにより、生成された画像が同じドメインの埋め込み空間で実際の画像に近づくようになります。さらに、チャネルアテンションに位置情報を埋め込むことで、DCA と呼ばれる二重座標アテンションネットワークを提案します。ジェネレーターの設計には DCA ネットワークを採用し、ジェネレーターが水平方向および垂直方向のグローバルな依存関係情報を取得できるようにします。多くの画像から画像への変換タスクにおいて、私たちの方法は最先端の結果を達成しており、私たちの方法の利点は広範な比較実験とアブレーション研究によって証明されています。

Recently, unsupervised image-to-image translation methods based on contrastive learning have achieved state-of-the-art results in many tasks. However, in the previous work, the negatives are sampled from the input image itself, which inspires us to design a data augmentation method to improve the quality of the selected negatives. Moreover, retaining the content similarity via patch-wise contrastive learning in the embedding space, the previous methods ignore the domain consistency between the generated image and the real images of target domain. In this paper, we propose a novel unsupervised image-to-image translation framework based on multi-crop contrastive learning and domain consistency, called MCDUT. Specifically, we obtain the multi-crop views via the center-crop and the random-crop to generate the negatives, which can increase the quality of the negatives. To constrain the embeddings in the deep feature space, we formulate a new domain consistency loss, which encourages the generated images to be close to the real images in the embedding space of same domain. Furthermore, we present a dual coordinate attention network by embedding positional information into channel attention, which called DCA. We employ the DCA network in the design of generator, which makes the generator capture the horizontal and vertical global information of dependency. In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proven through extensive comparison experiments and ablation research.

updated: Sun Jul 02 2023 09:44:10 GMT+0000 (UTC)

published: Mon Apr 24 2023 16:20:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト