Multi-cropping Contrastive Learning and Domain Consistency for Unsupervised Image-to-Image Translation

Chen Zhao; Wei-Ling Cai; Zheng Yuan; Cheng-Wei Hu

教師なし画像から画像への変換のためのマルチクロップ対照学習とドメインの一貫性

最近、対照学習に基づく教師なし画像間変換手法が、多くのタスクで最先端の結果を達成しました。ただし、以前の研究では、ネガは入力画像自体からサンプリングされていたため、選択したネガの品質を向上させるデータ拡張方法を設計するようになりました。さらに、以前の方法は、埋め込み空間でのパッチごとの対比学習によってコンテンツの一貫性を維持するだけであり、生成された画像とターゲットドメインの実際の画像の間のドメインの一貫性は無視されます。この論文では、MCDUT と呼ばれる、マルチクロップ対比学習とドメイン一貫性に基づいた新しい教師なし画像間変換フレームワークを提案します。具体的には、高品質のネガティブサンプルをさらに生成することを目的として、センタークロップとランダムクロップを介してマルチクロップビューを取得します。深い特徴空間での埋め込みを制約するために、新しいドメイン一貫性損失を定式化します。これにより、生成された画像が同じドメインの埋め込み空間で実際の画像に近づくようになります。さらに、チャネルに位置情報を埋め込むことで、DCA と呼ばれる二重座標アテンションネットワークを提案します。ジェネレーターの設計には DCA ネットワークを採用し、ジェネレーターが水平方向および垂直方向のグローバルな依存関係情報を取得できるようにします。多くの画像から画像への変換タスクにおいて、私たちの方法は最先端の結果を達成しており、私たちの方法の利点は広範な比較実験とアブレーション研究によって証明されています。

Recently, unsupervised image-to-image translation methods based on contrastive learning have achieved state-of-the-art results in many tasks. However, in the previous works, the negatives are sampled from the input image itself, which inspires us to design a data augmentation method to improve the quality of the selected negatives. Moreover, the previous methods only preserve the content consistency via patch-wise contrastive learning in the embedding space, which ignores the domain consistency between the generated images and the real images of the target domain. In this paper, we propose a novel unsupervised image-to-image translation framework based on multi-cropping contrastive learning and domain consistency, called MCDUT. Specifically, we obtain the multi-cropping views via the center-cropping and the random-cropping with the aim of further generating the high-quality negative examples. To constrain the embeddings in the deep feature space, we formulate a new domain consistency loss, which encourages the generated images to be close to the real images in the embedding space of the same domain. Furthermore, we present a dual coordinate attention network by embedding positional information into the channel, which called DCA. We employ the DCA network in the design of generator, which makes the generator capture the horizontal and vertical global information of dependency. In many image-to-image translation tasks, our method achieves state-of-the-art results, and the advantages of our method have been proven through extensive comparison experiments and ablation research.

updated: Wed Jul 05 2023 07:30:58 GMT+0000 (UTC)

published: Mon Apr 24 2023 16:20:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト