Translation-equivariant Image Quantizer for Bi-directional Image-Text Generation

Woncheol Shin; Gyubok Lee; Jiyoung Lee; Joonseok Lee; Edward Choi

双方向画像テキスト生成のための翻訳同変画像量子化器

最近、ベクトル量子化された画像モデリングは、テキストから画像への生成などの生成タスクで優れたパフォーマンスを示しています。ただし、現在の画像量子化器は、エイリアシング、ダウンストリームのテキストから画像への生成、および画像からテキストへの生成のパフォーマンスの低下により、単純な実験設定でも、量子化された空間での並進等分散性を満たさないことがわかりました。アンチエイリアシングに焦点を合わせる代わりに、量子化された空間での並進同変を促進するために直接的なアプローチを取ります。特に、「量子化空間における並進同変」と呼ばれる画像量子化器の望ましい特性を調査し、コードブック埋め込みベクトルの直交性を正規化することによって並進同変を達成するためのシンプルで効果的な方法を提案します。この方法を使用すると、テキストから画像への生成で+ 22％、画像からテキストへの生成で+ 26％精度が向上し、VQGANを上回ります。

Recently, vector-quantized image modeling has demonstrated impressive performance on generation tasks such as text-to-image generation. However, we discover that the current image quantizers do not satisfy translation equivariance in the quantized space due to aliasing, degrading performance in the downstream text-to-image generation and image-to-text generation, even in simple experimental setups. Instead of focusing on anti-aliasing, we take a direct approach to encourage translation equivariance in the quantized space. In particular, we explore a desirable property of image quantizers, called 'Translation Equivariance in the Quantized Space' and propose a simple but effective way to achieve translation equivariance by regularizing orthogonality in the codebook embedding vectors. Using this method, we improve accuracy by +22% in text-to-image generation and +26% in image-to-text generation, outperforming the VQGAN.

updated: Wed Dec 01 2021 10:08:24 GMT+0000 (UTC)

published: Wed Dec 01 2021 10:08:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト