DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Weijia Wu; Yuzhong Zhao; Hao Chen; Yuchao Gu; Rui Zhao; Yefei He; Hong Zhou; Mike Zheng Shou; Chunhua Shen

DatasetDM: 拡散モデルを使用した知覚アノテーションを含むデータの合成

現在のディープネットワークは非常にデータを必要とし、大規模なデータセットでのトレーニングから恩恵を受けますが、データの収集と注釈付けには時間がかかることがよくあります。対照的に、合成データは、DALL-E や拡散モデルなどの生成モデルを使用して、最小限の労力とコストで無限に生成できます。この論文では、多様な合成画像とそれに対応する高品質の知覚アノテーション (セグメンテーションマスクや深度など) を生成できる汎用データセット生成モデルである DatasetDM を紹介します。私たちの方法は、事前にトレーニングされた拡散モデルに基づいて構築されており、テキストガイドによる画像合成を知覚データ生成まで拡張します。拡散モデルの豊富な潜在コードが、デコーダモジュールを使用して正確な知覚アノテーションとして効果的にデコードできることを示します。デコーダーのトレーニングに必要なのは、手動でラベル付けされた画像の 1% 未満 (約 100 枚の画像) だけであり、無限大の注釈付きデータセットの生成が可能になります。これらの合成データは、下流タスクのさまざまな認識モデルをトレーニングするために使用できます。提案されたアプローチの能力を示すために、セマンティックセグメンテーション、インスタンスセグメンテーション、深度推定などの幅広い下流タスク用に、豊富で高密度のピクセル単位のラベルを持つデータセットを生成します。特に、1) セマンティックセグメンテーションとインスタンスセグメンテーションに関して最先端の結果が得られます。 2) 実際のデータを単独で使用するよりも、ドメイン一般化に関して大幅に堅牢です。そして最先端の結果、ゼロショットセグメンテーション設定が実現します。 3) 効率的なアプリケーションと新しいタスクの構成 (画像編集など) のための柔軟性。プロジェクトの Web サイトとコードは、それぞれ https://weijiawu.github.io/DatasetDM_page/ と https://github.com/showlab/DatasetDM にあります。

Current deep networks are very data-hungry and benefit from training on largescale datasets, which are often time-consuming to collect and annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E and diffusion models, with minimal effort and cost. In this paper, we present DatasetDM, a generic dataset generation model that can produce diverse synthetic images and the corresponding high-quality perception annotations (e.g., segmentation masks, and depth). Our method builds upon the pre-trained diffusion model and extends text-guided image synthesis to perception data generation. We show that the rich latent code of the diffusion model can be effectively decoded as accurate perception annotations using a decoder module. Training the decoder only needs less than 1% (around 100 images) manually labeled images, enabling the generation of an infinitely large annotated dataset. Then these synthetic data can be used for training various perception models for downstream tasks. To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation. Notably, it achieves 1) state-of-the-art results on semantic segmentation and instance segmentation; 2) significantly more robust on domain generalization than using the real data alone; and state-of-the-art results in zero-shot segmentation setting; and 3) flexibility for efficient application and novel task composition (e.g., image editing). The project website and code can be found at https://weijiawu.github.io/DatasetDM_page/ and https://github.com/showlab/DatasetDM, respectively

updated: Tue Oct 10 2023 03:59:41 GMT+0000 (UTC)

published: Fri Aug 11 2023 14:38:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト