TransGAN: Two Transformers Can Make One Strong GAN

Yifan Jiang; Shiyu Chang; Zhangyang Wang

TransGAN：2つのトランスフォーマーで1つの強力なGANを作成できます

トランスフォーマーに対する最近の爆発的な関心は、分類、検出、セグメンテーションなどのコンピュータービジョンタスクの強力な「ユニバーサル」モデルになる可能性を示唆しています。しかし、トランスフォーマーはさらに進むことができます-生成的敵対的ネットワーク（GAN）など、より悪名高い難しいビジョンタスクを実行する準備ができていますか？その好奇心に駆り立てられて、純粋なトランスベースのアーキテクチャのみを使用して、畳み込みのないGANを構築する最初のパイロットスタディを実施します。 TransGANと呼ばれる当社のバニラGANアーキテクチャは、埋め込み次元を減らしながら機能の解像度を段階的に向上させる、メモリに優しいトランスフォーマーベースのジェネレーターと、トランスフォーマーベースのパッチレベルのディスクリミネーターで構成されています。次に、TransGANが、データの拡張（標準のGANよりも多い）、ジェネレーターのマルチタスク共同トレーニング戦略、および自然画像の近傍の滑らかさを強調するローカルで初期化された自己注意から特に恩恵を受けることを示します。これらの調査結果を備えたTransGANは、より大きなモデルや高解像度の画像データセットで効果的にスケールアップできます。具体的には、当社の最高のアーキテクチャは、畳み込みバックボーンに基づく現在の最先端のGANと比較して非常に競争力のあるパフォーマンスを実現します。具体的には、TransGANはSTL-10で新しい最先端のISスコア10.10とFIDスコア25.32を設定します。また、Cifar-10ではそれぞれ8.64ISスコアと11.89FIDスコア、CelebA64×64では12.23FIDスコアに達します。最後に、TransGANの現在の制限と将来の可能性についても説明します。コードはhttps://github.com/VITA-Group/TransGANで入手できます。

The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN completely free of convolutions, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed TransGAN, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets new state-of-the-art IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA 64×64, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN. The code is available at https://github.com/VITA-Group/TransGAN.

updated: Tue Feb 16 2021 05:51:12 GMT+0000 (UTC)

published: Sun Feb 14 2021 05:24:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト