Fully Context-Aware Image Inpainting with a Learned Semantic Pyramid

Wendong Zhang; Yunbo Wang; Junwei Zhu; Ying Tai; Bingbing Ni; Xiaokang Yang

学習したセマンティックピラミッドを使用した完全なコンテキストアウェア画像の修復

画像内の任意の欠落領域に対して合理的で現実的なコンテンツを復元することは、重要でありながら困難な作業です。最近の画像修復モデルは、鮮明な視覚的詳細の生成において大きな進歩を遂げましたが、より複雑なシーンを処理する場合、コンテキストのあいまいさのために、テクスチャのぼやけや構造の歪みにつながる可能性があります。この問題に対処するために、特定の口実タスクからマルチスケールのセマンティックプライアを学習することで、画像内のローカルに欠落しているコンテンツの回復に大いに役立つという考えに動機付けられたセマンティックピラミッドネットワーク（SPN）を提案します。 SPNは2つのコンポーネントで構成されています。まず、セマンティックプライアを口実モデルからマルチスケール機能ピラミッドに抽出し、グローバルコンテキストとローカル構造の一貫した理解を実現します。事前学習者の中で、さまざまな学習済み事前学習者によって駆動される確率的画像修復を実現するための変分推論のためのオプションのモジュールを提示します。 SPNの2番目のコンポーネントは、完全にコンテキストアウェアな画像ジェネレーターです。これは、（確率的）以前のピラミッドを使用して、複数のスケールで低レベルの視覚的表現を適応的かつ段階的に洗練します。事前学習者と画像ジェネレーターを、後処理なしの統合モデルとしてトレーニングします。私たちのアプローチは、Places2、Paris StreetView、CelebA、CelebA-HQなどの複数のデータセットで、決定論的および確率論的な修復設定の両方で最先端の技術を実現します。

Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task. Although recent image inpainting models have made significant progress in generating vivid visual details, they can still lead to texture blurring or structural distortions due to contextual ambiguity when dealing with more complex scenes. To address this issue, we propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors from specific pretext tasks can greatly benefit the recovery of locally missing content in images. SPN consists of two components. First, it distills semantic priors from a pretext model into a multi-scale feature pyramid, achieving a consistent understanding of the global context and local structures. Within the prior learner, we present an optional module for variational inference to realize probabilistic image inpainting driven by various learned priors. The second component of SPN is a fully context-aware image generator, which adaptively and progressively refines low-level visual representations at multiple scales with the (stochastic) prior pyramid. We train the prior learner and the image generator as a unified model without any post-processing. Our approach achieves the state of the art on multiple datasets, including Places2, Paris StreetView, CelebA, and CelebA-HQ, under both deterministic and probabilistic inpainting setups.

updated: Wed Dec 08 2021 04:33:33 GMT+0000 (UTC)

published: Wed Dec 08 2021 04:33:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト