CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders

Kevin Frans; L. B. Soros; Olaf Witkowski

CLIPDraw：言語-画像エンコーダーによるテキストから図面への合成の調査

この作品は、自然言語の入力に基づいて新しいドローイングを合成するアルゴリズムであるCLIPDrawを紹介します。 CLIPDrawはトレーニングを必要としません。むしろ、事前にトレーニングされたCLIP言語画像エンコーダーが、指定された説明と生成された図面の間の類似性を最大化するためのメトリックとして使用されます。重要なのは、CLIPDrawがピクセル画像ではなくベクターストロークで動作することです。これは、人間が認識できるより単純な形状に図面を偏らせる制約です。結果は、CLIPDrawと他の最適化による合成方法を比較し、あいまいなテキストを複数の方法で満たす、多様な芸術スタイルで図面を確実に作成する、ストロークとして単純な視覚表現から複雑な視覚表現にスケーリングするなど、CLIPDrawのさまざまな興味深い動作を強調します。カウントが増加します。このメソッドを試すためのコードは、https：//colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynbで入手できます。

This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input. CLIPDraw does not require any training; rather a pre-trained CLIP language-image encoder is used as a metric for maximizing similarity between the given description and a generated drawing. Crucially, CLIPDraw operates over vector strokes rather than pixel images, a constraint that biases drawings towards simpler human-recognizable shapes. Results compare between CLIPDraw and other synthesis-through-optimization methods, as well as highlight various interesting behaviors of CLIPDraw, such as satisfying ambiguous text in multiple ways, reliably producing drawings in diverse artistic styles, and scaling from simple to complex visual representations as stroke count is increased. Code for experimenting with the method is available at: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb

updated: Mon Jun 28 2021 16:43:26 GMT+0000 (UTC)

published: Mon Jun 28 2021 16:43:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト