VisorGPT: Learning Visual Prior via Generative Pre-Training

Jinheng Xie; Kai Ye; Yudong Li; Yuexiang Li; Kevin Qinghong Lin; Yefeng Zheng; Linlin Shen; Mike Zheng Shou

VisorGPT: 生成的事前トレーニングによる視覚的な事前学習

視覚データ内のさまざまなものや物は特定の特性を持っており、ディープニューラルネットワークによって学習でき、モデル内のオブジェクトの位置や形状などの視覚的な事前分布として暗黙的に表現されます。このような事前の情報は、多くの視覚タスクに影響を与える可能性があります。たとえば、条件付き画像合成では、空間条件が事前条件に準拠していない場合、視覚的に不正確な合成結果が生じる可能性があります。この作業は、視覚的な事前学習を明示的に行い、サンプリングのカスタマイズを可能にすることを目的としています。言語モデリングの進歩に触発され、VisorGPT と呼ばれる生成事前トレーニングを通じてビジュアルを事前に学習することを提案します。オブジェクトの視覚的位置 (境界ボックス、人間のポーズ、インスタンスマスクなど) をシーケンスに離散化することにより、VisorGPT は尤度の最大化を通じて視覚的な事前モデルを作成できます。さらに、さまざまな視覚的位置を統一し、学習した事前情報からの連続出力のカスタマイズされたサンプリングを可能にするプロンプトエンジニアリングが研究されています。実験結果は、VisorGPT が視覚的な事前分布を効果的にモデル化できることを示しており、これは、ControlNet のような条件付き画像合成モデルの正確な人間のポーズのカスタマイズなど、多くの視覚タスクに使用できます。コードは https://github.com/Sierkinhane/VisorGPT でリリースされます。

Various stuff and things in visual data possess specific traits, which can be learned by deep neural networks and are implicitly represented as the visual prior, e.g., object location and shape, in the model. Such prior potentially impacts many vision tasks. For example, in conditional image synthesis, spatial conditions failing to adhere to the prior can result in visually inaccurate synthetic results. This work aims to explicitly learn the visual prior and enable the customization of sampling. Inspired by advances in language modeling, we propose to learn Visual prior via Generative Pre-Training, dubbed VisorGPT. By discretizing visual locations of objects, e.g., bounding boxes, human pose, and instance masks, into sequences, VisorGPT can model visual prior through likelihood maximization. Besides, prompt engineering is investigated to unify various visual locations and enable customized sampling of sequential outputs from the learned prior. Experimental results demonstrate that VisorGPT can effectively model the visual prior, which can be employed for many vision tasks, such as customizing accurate human pose for conditional image synthesis models like ControlNet. Code will be released at https://github.com/Sierkinhane/VisorGPT.

updated: Sun May 28 2023 11:34:43 GMT+0000 (UTC)

published: Tue May 23 2023 07:45:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト