The Image Local Autoregressive Transformer

Chenjie Cao; Yuxin Hong; Xiang Li; Chengrong Wang; Chengming Xu; XiangYang Xue; Yanwei Fu

画像の局所自己回帰変換器

最近、トランスフォーマーによって強化された画像生成全体の AutoRegressive (AR) モデルは、敵対的生成ネットワーク (GAN) に匹敵するか、それ以上のパフォーマンスを達成しています。残念ながら、このような AR モデルをローカル画像領域の編集/変更に直接適用すると、グローバル情報の欠落、推論速度の低下、ローカルガイダンスの情報漏えいなどの問題が発生する可能性があります。これらの制限に対処するために、局所的に誘導された画像合成をより容易にするための新しいモデル -- 画像局所自己回帰変換 (iLAT) を提案します。私たちの iLAT は、新たに提案されたアテンションマスクと畳み込みメカニズムのローカル自己回帰 (LA) トランスフォーマーによって、新しいローカル離散表現を学習します。このように、iLAT はキーガイダンス情報によって局所的な画像領域を効率的に合成できます。当社の iLAT は、ポーズガイド付き人物画像合成や顔編集など、ローカルガイド付きのさまざまな画像合成で評価されています。定量的および定性的な結果の両方が、私たちのモデルの有効性を示しています。

Recently, AutoRegressive (AR) models for the whole image generation empowered by transformers have achieved comparable or even better performance to Generative Adversarial Networks (GANs). Unfortunately, directly applying such AR models to edit/change local image regions, may suffer from the problems of missing global information, slow inference speed, and information leakage of local guidance. To address these limitations, we propose a novel model -- image Local Autoregressive Transformer (iLAT), to better facilitate the locally guided image synthesis. Our iLAT learns the novel local discrete representations, by the newly proposed local autoregressive (LA) transformer of the attention mask and convolution mechanism. Thus iLAT can efficiently synthesize the local image regions by key guidance information. Our iLAT is evaluated on various locally guided image syntheses, such as pose-guided person image synthesis and face editing. Both the quantitative and qualitative results show the efficacy of our model.

updated: Mon Oct 18 2021 10:34:26 GMT+0000 (UTC)

published: Fri Jun 04 2021 14:33:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト