MixerGAN: An MLP-Based Architecture for Unpaired Image-to-Image Translation

George Cazenavette; Manuel Ladron De Guevara

MixerGAN: 対になっていない画像から画像への変換のための MLP ベースのアーキテクチャ

アテンションベースのトランスフォーマーネットワークは、ほぼすべての言語タスクで比類のない成功を収めていますが、二次活性化メモリ使用量と相まって多数のトークンにより、視覚的タスクには使用できなくなります。このように、言語から言語への翻訳はトランスフォーマーモデルによって革命を起こしましたが、畳み込みネットワークは依然として画像から画像への翻訳の事実上のソリューションです。最近提案された MLP-Mixer アーキテクチャは、変換モデルを望ましいものにする長距離接続を維持しながら、アテンションベースのネットワークに関連する速度とメモリの問題の一部を軽減します。自己注意に代わるこの効率的な代替手段を利用して、MixerGAN と呼ばれる新しいペアになっていない画像から画像への変換モデルを提案します。これは、高価な注意メカニズムを必要とせずに、ピクセル間の長距離関係を考慮する、よりシンプルな MLP ベースのアーキテクチャです。定量的および定性的分析は、以前の畳み込みベースの方法と比較した場合、MixerGAN が競争力のある結果を達成することを示しています。

While attention-based transformer networks achieve unparalleled success in nearly all language tasks, the large number of tokens coupled with the quadratic activation memory usage makes them prohibitive for visual tasks. As such, while language-to-language translation has been revolutionized by the transformer model, convolutional networks remain the de facto solution for image-to-image translation. The recently proposed MLP-Mixer architecture alleviates some of the speed and memory issues associated with attention-based networks while still retaining the long-range connections that make transformer models desirable. Leveraging this efficient alternative to self-attention, we propose a new unpaired image-to-image translation model called MixerGAN: a simpler MLP-based architecture that considers long-distance relationships between pixels without the need for expensive attention mechanisms. Quantitative and qualitative analysis shows that MixerGAN achieves competitive results when compared to prior convolutional-based methods.

updated: Fri May 28 2021 21:12:52 GMT+0000 (UTC)

published: Fri May 28 2021 21:12:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト