Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Xingang Pan; Ayush Tewari; Thomas Leimkühler; Lingjie Liu; Abhimitra Meka; Christian Theobalt

GAN をドラッグ: 生成画像マニホールドでのインタラクティブなポイントベースの操作

ユーザーのニーズを満たすビジュアルコンテンツを合成するには、多くの場合、生成されるオブジェクトのポーズ、形状、表現、レイアウトを柔軟かつ正確に制御できることが必要です。既存のアプローチは、手動で注釈を付けたトレーニングデータまたは以前の 3D モデルを介して敵対的生成ネットワーク (GAN) の制御性を獲得しますが、多くの場合、柔軟性、精度、汎用性に欠けます。この研究では、GAN を制御する強力でありながらあまり研究されていない方法、つまり、図 1 に示すように、画像の任意の点を「ドラッグ」して、ユーザー対話型の方法でターゲット点に正確に到達する方法を研究します。これを達成するために、我々は DragGAN を提案します。これは 2 つの主要コンポーネントで構成されます。1) ハンドルポイントを駆動してターゲット位置に移動させる機能ベースの動作監視、2) 識別ジェネレーター機能を活用して、ハンドルポイントの位置をローカライズし続けます。 DragGAN を使用すると、誰でもピクセルの配置を正確に制御して画像を変形でき、動物、車、人間、風景などのさまざまなカテゴリのポーズ、形状、表現、レイアウトを操作できます。 GAN の生成画像多様体を使用すると、遮蔽されたコンテンツの幻覚や、オブジェクトの剛性に一貫して従う形状の変形などの困難なシナリオでも、現実的な出力を生成する傾向があります。定性的比較と定量的比較の両方で、画像操作とポイント追跡のタスクにおいて、従来のアプローチよりも DragGAN が優れていることが実証されています。また、GAN 逆変換による実際の画像の操作も紹介します。

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

updated: Wed Jul 17 2024 10:27:55 GMT+0000 (UTC)

published: Thu May 18 2023 13:41:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト