DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection

Yunhao Ge; Jiashu Xu; Brian Nlong Zhao; Neel Joshi; Laurent Itti; Vibhav Vineet

検出のための DALL-E: オブジェクト検出のための言語主導の合成画像合成

テキストから画像への合成フレームワーク（DALL-E、Stable Diffusionなど）を使用して、正確なラベルを備えたトレーニングデータを大規模に自動的に生成する新しいパラダイムを提案します。提案されたアプローチは、トレーニングデータの生成を前景オブジェクトマスクの生成と背景 (コンテキスト) 画像の生成に分離します。前景オブジェクトマスクの生成では、オブジェクトクラス名を含む単純なテキストテンプレートを DALL-E への入力として使用して、さまざまな前景画像のセットを生成します。次に、フォアグラウンド-バックグラウンドセグメンテーションアルゴリズムを使用して、フォアグラウンドオブジェクトマスクを生成します。次に、コンテキスト画像を生成するために、まず、コンテキストを表す画像の小さなセットに画像キャプション法を適用することにより、コンテキストの言語記述が生成されます。これらの言語記述は、DALL-E フレームワークを使用してさまざまなコンテキストイメージのセットを生成するために使用されます。次に、これらを最初のステップで生成されたオブジェクトマスクと合成して、分類子の拡張トレーニングセットを提供します。 Pascal VOC および COCO オブジェクト検出タスクを含む 4 つのオブジェクト検出データセットに対するアプローチの利点を示します。さらに、アウトオブディストリビューションおよびゼロショットデータ生成シナリオにおけるデータ生成アプローチの構成上の性質も強調します。

We propose a new paradigm to automatically generate training data with accurate labels at scale using the text-toimage synthesis frameworks (e.g., DALL-E, Stable Diffusion, etc.). The proposed approach decouples training data generation into foreground object mask generation and background (context) image generation. For foreground object mask generation, we use a simple textual template with object class name as input to DALL-E to generate a diverse set of foreground images. A foreground-background segmentation algorithm is then used to generate foreground object masks. Next, in order to generate context images, first a language description of the context is generated by applying an image captioning method on a small set of images representing the context. These language descriptions are then used to generate diverse sets of context images using the DALL-E framework. These are then composited with object masks generated in the first step to provide an augmented training set for a classifier. We demonstrate the advantages of our approach on four object detection datasets including on Pascal VOC and COCO object detection tasks. Furthermore, we also highlight the compositional nature of our data generation approach on out-of-distribution and zero-shot data generation scenarios.

updated: Thu Dec 22 2022 00:55:29 GMT+0000 (UTC)

published: Mon Jun 20 2022 06:43:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト