CLIPstyler: Image Style Transfer with a Single Text Condition

Gihyun Kwon; Jong Chul Ye

CLIPstyler：単一のテキスト条件での画像スタイルの転送

既存のニューラルスタイル転送方法では、スタイル画像のテクスチャ情報をコンテンツ画像に転送するために参照スタイル画像が必要です。ただし、多くの実際の状況では、ユーザーは参照スタイルの画像を持っていない場合でも、想像するだけでスタイルを転送することに関心があります。このようなアプリケーションに対処するために、スタイルイメージを「使用せずに」スタイル転送を可能にする新しいフレームワークを提案しますが、目的のスタイルのテキスト説明のみを使用します。 CLIPの事前にトレーニングされたテキスト画像埋め込みモデルを使用して、単一のテキスト条件でのみコンテンツ画像のスタイルの変調を示します。具体的には、現実的なテクスチャ転送のためのマルチビュー拡張によるパッチごとのテキスト画像マッチング損失を提案します。広範な実験結果により、セマンティッククエリテキストを反映するリアルなテクスチャを使用した画像スタイルの転送が成功したことが確認されました。

Existing neural style transfer methods require reference style images to transfer texture information of style images to content images. However, in many practical situations, users may not have reference style images but still be interested in transferring styles by just imagining them. In order to deal with such applications, we propose a new framework that enables a style transfer `without' a style image, but only with a text description of the desired style. Using the pre-trained text-image embedding model of CLIP, we demonstrate the modulation of the style of content images only with a single text condition. Specifically, we propose a patch-wise text-image matching loss with multiview augmentations for realistic texture transfer. Extensive experimental results confirmed the successful image style transfer with realistic textures that reflect semantic query texts.

updated: Wed Dec 01 2021 09:48:53 GMT+0000 (UTC)

published: Wed Dec 01 2021 09:48:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト