VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and Feature-level Geospatial Conditional Inputs

Xuerong Xiao; Swetava Ganguli; Vipul Pandey

VAE-Info-cGAN：ピクセルレベルと機能レベルの地理空間条件付き入力を組み合わせて合成画像を生成する

クラスバランスのとれた多様なトレーニングデータが不足しているため、コンピュータビジョンの多くの地理空間アプリケーション用の堅牢な教師あり深層学習モデルのトレーニングは困難です。逆に、多くのアプリケーションで十分なトレーニングデータを取得することは、特にアプリケーションがまれなイベントや極端なイベントのモデリングを伴う場合、経済的に法外なものであるか、実行不可能な場合があります。ターゲット分布からサンプリングし、画像のマルチスケールの性質を活用できる生成モデルを使用してデータ（およびラベル）を合成的に生成することは、ラベル付けされたデータの不足に対処するための安価なソリューションになります。この目標に向けて、VAE-Info-cGANと呼ばれる深い条件付き生成モデルを提示します。これは、Variational Autoencoder（VAE）と条件付きInformation Maximizing Generative Adversarial Network（InfoGAN）を組み合わせて、ピクセルで同時に条件付けされた意味的に豊富な画像を合成します。レベル条件（PLC）および巨視的機能レベル条件（FLC）。寸法的には、PLCは合成された画像からのチャネル寸法のみを変更でき、タスク固有の入力となることを目的としています。 FLCは、生成された画像の潜在空間内の属性ベクトルとしてモデル化され、ターゲット分布に密接に関係するさまざまな特性属性の寄与を制御します。選択されたバイナリ巨視的特徴を変化させることによって合成画像を体系的に生成するための属性ベクトルの解釈が探求されます。 GPS軌道データセットでの実験は、提案されたモデルが、道路網のラスター表現のみを条件として、さまざまな地理的位置にわたってさまざまな形式の時空間集約を正確に生成できることを示しています。 VAE-Info-cGANの主な目的のアプリケーションは、地理空間分析とリモートセンシングに関連する問題のコンピュータービジョンベースのモデリングのためのターゲットデータ拡張のための合成データ（およびラベル）の生成です。

Training robust supervised deep learning models for many geospatial applications of computer vision is difficult due to dearth of class-balanced and diverse training data. Conversely, obtaining enough training data for many applications is financially prohibitive or may be infeasible, especially when the application involves modeling rare or extreme events. Synthetically generating data (and labels) using a generative model that can sample from a target distribution and exploit the multi-scale nature of images can be an inexpensive solution to address scarcity of labeled data. Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC). Dimensionally, the PLC can only vary in the channel dimension from the synthesized image and is meant to be a task-specific input. The FLC is modeled as an attribute vector in the latent space of the generated image which controls the contributions of various characteristic attributes germane to the target distribution. An interpretation of the attribute vector to systematically generate synthetic images by varying a chosen binary macroscopic feature is explored. Experiments on a GPS trajectories dataset show that the proposed model can accurately generate various forms of spatio-temporal aggregates across different geographic locations while conditioned only on a raster representation of the road network. The primary intended application of the VAE-Info-cGAN is synthetic data (and label) generation for targeted data augmentation for computer vision-based modeling of problems relevant to geospatial analysis and remote sensing.

updated: Tue Dec 08 2020 03:46:19 GMT+0000 (UTC)

published: Tue Dec 08 2020 03:46:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト