Interactive Image Synthesis with Panoptic Layout Generation

Bo Wang; Tao Wu; Minfeng Zhu; Peng Du

パノプティコンレイアウト生成によるインタラクティブな画像合成

ユーザーガイド入力からのインタラクティブな画像合成は、ユーザーが生成された画像のシーン構造を簡単に制御したい場合に困難な作業です。インタラクティブでリアルな偽の画像を取得するために、レイアウトベースの画像合成アプローチは目覚ましい進歩を遂げています。シーンでは、既存の方法は高精度の入力を必要としますが、これはおそらく数回の調整が必要であり、初心者ユーザーには不向きです。バウンディングボックスの配置が混乱する場合、レイアウトベースのモデルは、構築されたセマンティックレイアウトに「領域がない」ため、生成された画像に望ましくないアーティファクトが発生します。この作業では、この課題に対処するためにPanoptic Layout Generative Adversarial Networks（PLGAN）を提案します。 PLGANは、アモルファス境界を持つ「もの」と明確に定義された形状を持つ「もの」の間でオブジェクトカテゴリを区別するパノプティコン理論を採用しています。特に、スタッフレイアウトはアモルファス形状を取り、インスタンスレイアウトによって除外された欠落領域を埋めることができます。 PLGANを、COCO-Stuff、Visual Genome、Landscapeデータセットの最新のレイアウトベースのモデルと実験的に比較します。 PLGANの利点は、視覚的に示されるだけでなく、開始スコア、フレシェ開始距離、分類精度スコア、およびカバレッジの観点から定量的に検証されます。

Interactive image synthesis from user-guided input is a challenging task when users wish to control the scene structure of a generated image with ease.Although remarkable progress has been made on layout-based image synthesis approaches, in order to get realistic fake image in interactive scene, existing methods require high-precision inputs, which probably need adjustment several times and are unfriendly to novice users. When placement of bounding boxes is subject to perturbation, layout-based models suffer from "missing regions" in the constructed semantic layouts and hence undesirable artifacts in the generated images. In this work, we propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge. The PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes, such that stuff and instance layouts are constructed through separate branches and later fused into panoptic layouts. In particular, the stuff layouts can take amorphous shapes and fill up the missing regions left out by the instance layouts. We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets. The advantages of PLGAN are not only visually demonstrated but quantitatively verified in terms of inception score, Fréchet inception distance, classification accuracy score, and coverage.

updated: Thu Mar 10 2022 02:23:30 GMT+0000 (UTC)

published: Fri Mar 04 2022 02:45:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト