GAN Compression: Efficient Architectures for Interactive Conditional GANs

Muyang Li; Ji Lin; Yaoyao Ding; Zhijian Liu; Jun-Yan Zhu; Song Han

GAN圧縮：インタラクティブな条件付きGANのための効率的なアーキテクチャ

条件付き生成的敵対的ネットワーク（cGAN）により、多くの視覚およびグラフィックアプリケーションで制御可能な画像合成が可能になりました。ただし、最近のcGANは、最新の認識CNNよりも1〜2桁多く計算集約的です。たとえば、GauGANは、MobileNet-v3の0.44G MACと比較して、イメージごとに281G MACを消費するため、インタラクティブな展開が困難になります。この作業では、cGANのジェネレータの推論時間とモデルサイズを削減するための汎用圧縮フレームワークを提案します。既存の圧縮方法を直接適用すると、GANトレーニングの難しさとジェネレータアーキテクチャの違いにより、パフォーマンスが低下します。これらの課題には2つの方法で対処します。まず、GANトレーニングを安定させるために、元のモデルの複数の中間表現の知識を圧縮モデルに転送し、ペアになっていない学習とペアになっている学習を統合します。第二に、既存のCNN設計を再利用する代わりに、私たちの方法はニューラルアーキテクチャ検索を介して効率的なアーキテクチャを見つけます。検索プロセスを高速化するために、モデルのトレーニングとウェイトシェアリングを介した検索を分離します。実験は、さまざまな監視設定、ネットワークアーキテクチャ、および学習方法にわたる私たちの方法の有効性を示しています。画質を損なうことなく、CycleGANの計算を21倍、Pix2pixを12倍、MUNITを29倍、GauGANの計算を9倍削減し、インタラクティブな画像合成への道を開きます。

Conditional Generative Adversarial Networks (cGANs) have enabled controllable image synthesis for many vision and graphics applications. However, recent cGANs are 1-2 orders of magnitude more compute-intensive than modern recognition CNNs. For example, GauGAN consumes 281G MACs per image, compared to 0.44G MACs for MobileNet-v3, making it difficult for interactive deployment. In this work, we propose a general-purpose compression framework for reducing the inference time and model size of the generator in cGANs. Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures. We address these challenges in two ways. First, to stabilize GAN training, we transfer knowledge of multiple intermediate representations of the original model to its compressed model and unify unpaired and paired learning. Second, instead of reusing existing CNN designs, our method finds efficient architectures via neural architecture search. To accelerate the search process, we decouple the model training and search via weight sharing. Experiments demonstrate the effectiveness of our method across different supervision settings, network architectures, and learning methods. Without losing image quality, we reduce the computation of CycleGAN by 21x, Pix2pix by 12x, MUNIT by 29x, and GauGAN by 9x, paving the way for interactive image synthesis.

updated: Thu Nov 11 2021 03:45:16 GMT+0000 (UTC)

published: Thu Mar 19 2020 17:59:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト