CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics

Yiren Song; Xning Shao; Kang Chen; Weidong Zhang; Minzhe Li; Zhongliang Jing

CLIPVG: 微分可能なベクトルグラフィックスを使用したテキストガイド付き画像操作

最近、テキストガイドによる画像操作に CLIP (Contrasive Language-Image Pre-Training) モデルを活用することで、かなりの進歩が見られました。ただし、既存のすべての作品は、結果の品質を確保するために追加の生成モデルに依存しています。これは、CLIP だけでは細かいピクセルレベルの変更に関する十分なガイダンス情報を提供できないためです。このホワイトペーパーでは、微分可能なベクターグラフィックスを使用したテキストガイド付き画像操作フレームワークである CLIPVG を紹介します。これは、追加の生成モデルを必要としない最初の CLIP ベースの一般的な画像操作フレームワークでもあります。 CLIPVG は、セマンティックの正確性と合成品質の両方で最先端のパフォーマンスを達成できるだけでなく、既存のすべてのメソッドの機能をはるかに超えるさまざまなアプリケーションをサポートするのに十分な柔軟性があることを示しています。

Considerable progress has recently been made in leveraging CLIP (Contrastive Language-Image Pre-Training) models for text-guided image manipulation. However, all existing works rely on additional generative models to ensure the quality of results, because CLIP alone cannot provide enough guidance information for fine-scale pixel-level changes. In this paper, we introduce CLIPVG, a text-guided image manipulation framework using differentiable vector graphics, which is also the first CLIP-based general image manipulation framework that does not require any additional generative models. We demonstrate that CLIPVG can not only achieve state-of-art performance in both semantic correctness and synthesis quality, but also is flexible enough to support various applications far beyond the capability of all existing methods.

updated: Mon Dec 05 2022 09:29:31 GMT+0000 (UTC)

published: Mon Dec 05 2022 09:29:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト