Train Sparsely, Generate Densely: Memory-efficient Unsupervised Training of High-resolution Temporal GAN

Masaki Saito; Shunta Saito; Masanori Koyama; Sosuke Kobayashi

疎にトレーニング、密に生成：高解像度の時間的GANのメモリ効率の高い教師なしトレーニング

ビデオデータセットのサイズと各観測の複雑さのため、ビデオデータセットでの生成的敵対的ネットワーク（GAN）のトレーニングは困難です。一般に、GANのトレーニングの計算コストは、解像度に応じて指数関数的に増加します。この研究では、計算コストが解像度とのみ線形に比例する高解像度ビデオデータセットの教師なし学習の新しいメモリ効率的な方法を紹介します。これを実現するには、ジェネレーターモデルを小さなサブジェネレーターのスタックとして設計し、特定の方法でモデルをトレーニングします。それぞれのサブジェネレーターを独自の特定の識別子でトレーニングします。トレーニング時に、連続するサブジェネレータの各ペアの間に、フレームレートを特定の比率で削減する補助サブサンプリングレイヤーを導入します。この手順により、各サブジェネレーターがさまざまなレベルの解像度でビデオの配信を学習できます。また、開始スコアの点で前任者よりもはるかに優れた非常に複雑なジェネレーターをトレーニングするために必要なGPUはわずかです。

Training of Generative Adversarial Network (GAN) on a video dataset is a challenge because of the sheer size of the dataset and the complexity of each observation. In general, the computational cost of training GAN scales exponentially with the resolution. In this study, we present a novel memory efficient method of unsupervised learning of high-resolution video dataset whose computational cost scales only linearly with the resolution. We achieve this by designing the generator model as a stack of small sub-generators and training the model in a specific way. We train each sub-generator with its own specific discriminator. At the time of the training, we introduce between each pair of consecutive sub-generators an auxiliary subsampling layer that reduces the frame-rate by a certain ratio. This procedure can allow each sub-generator to learn the distribution of the video at different levels of resolution. We also need only a few GPUs to train a highly complex generator that far outperforms the predecessor in terms of inception scores.

updated: Mon Jun 01 2020 11:47:45 GMT+0000 (UTC)

published: Thu Nov 22 2018 17:31:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト