Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

Xihui Liu; Zhe Lin; Jianming Zhang; Handong Zhao; Quan Tran; Xiaogang Wang; Hongsheng Li

Open-Edit：Open-Vocabulary Instructionsを使用したオープンドメインの画像操作

Open-Editという名前の新しいアルゴリズムを提案します。これは、open-vocabulary命令を使用したopen-domain画像操作の最初の試みです。画像ドメインの大きなバリエーションとトレーニングの監督の欠如を考えると、これは困難な作業です。私たちのアプローチは、一般的な画像キャプションデータセットで事前トレーニングされた統合された視覚的意味の埋め込みスペースを利用し、画像機能マップにテキスト誘導ベクトル演算を適用することで、埋め込まれた視覚機能を操作します。次に、構造保持画像デコーダーが、操作された特徴マップから操作された画像を生成します。さらに、操作された画像を正則化し、ソース画像の詳細を保持するように強制するサイクル一貫性制約を備えた、オンザフライのサンプル固有の最適化アプローチを提案します。私たちのアプローチは、オープンドメインの画像のさまざまなシナリオに対して、オープン語彙の色、テクスチャ、および高レベルの属性を操作することで有望な結果を示しています。

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A structure-preserving image decoder then generates the manipulated images from the manipulated feature maps. We further propose an on-the-fly sample-specific optimization approach with cycle-consistency constraints to regularize the manipulated images and force them to preserve details of the source images. Our approach shows promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.

updated: Wed Apr 21 2021 13:31:55 GMT+0000 (UTC)

published: Tue Aug 04 2020 14:15:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト