Aggregated Contextual Transformations for High-Resolution Image Inpainting

Yanhong Zeng; Jianlong Fu; Hongyang Chao; Baining Guo

高解像度画像修復のための集約されたコンテキスト変換

最先端の画像修復アプローチでは、高解像度画像（512x512など）で歪んだ構造やぼやけたテクスチャが生成される可能性があります。課題は主に、（1）離れたコンテキストからの画像コンテンツの推論、および（2）大きな欠落領域のきめ細かいテクスチャ合成から生じます。これらの2つの課題を克服するために、高解像度の画像修復のために、Aggregated COntextual-Transformation GAN（AOT-GAN）という名前の拡張GANベースのモデルを提案します。具体的には、コンテキスト推論を強化するために、提案されたAOTブロックの複数のレイヤーをスタックすることによってAOT-GANのジェネレーターを構築します。 AOTは、さまざまな受容野からの集約されたコンテキスト変換をブロックし、有益な遠方の画像コンテキストとコンテキスト推論のための関心のある豊富なパターンの両方をキャプチャできるようにします。テクスチャ合成を改善するために、調整されたマスク予測タスクでAOT-GANをトレーニングすることにより、AOT-GANの識別機能を強化します。このようなトレーニングの目的は、識別器に実際のパッチと合成されたパッチの詳細な外観を区別させることであり、次に、ジェネレータがクリアなテクスチャを合成するのを容易にします。 365の複雑なシーンの180万の高解像度画像を使用した最も困難なベンチマークであるPlaces2の広範な比較は、私たちのモデルがFIDに関して最先端技術を大幅に上回り、38.60％の相対的改善を示しています。 30人以上の被験者を含むユーザー研究は、AOT-GANの優位性をさらに検証します。さらに、提案されたAOT-GANを、ロゴの削除、顔の編集、オブジェクトの削除などの実際のアプリケーションで評価します。結果は、私たちのモデルが現実の世界で有望な完成を達成することを示しています。コードとモデルはhttps://github.com/researchmm/AOT-GAN-for-Inpaintingでリリースされています。

State-of-the-art image inpainting approaches can suffer from generating distorted structures and blurry textures in high-resolution images (e.g., 512x512). The challenges mainly drive from (1) image content reasoning from distant contexts, and (2) fine-grained texture synthesis for a large missing region. To overcome these two challenges, we propose an enhanced GAN-based model, named Aggregated COntextual-Transformation GAN (AOT-GAN), for high-resolution image inpainting. Specifically, to enhance context reasoning, we construct the generator of AOT-GAN by stacking multiple layers of a proposed AOT block. The AOT blocks aggregate contextual transformations from various receptive fields, allowing to capture both informative distant image contexts and rich patterns of interest for context reasoning. For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task. Such a training objective forces the discriminator to distinguish the detailed appearances of real and synthesized patches, and in turn, facilitates the generator to synthesize clear textures. Extensive comparisons on Places2, the most challenging benchmark with 1.8 million high-resolution images of 365 complex scenes, show that our model outperforms the state-of-the-art by a significant margin in terms of FID with 38.60% relative improvement. A user study including more than 30 subjects further validates the superiority of AOT-GAN. We further evaluate the proposed AOT-GAN in practical applications, e.g., logo removal, face editing, and object removal. Results show that our model achieves promising completions in the real world. We release code and models in https://github.com/researchmm/AOT-GAN-for-Inpainting.

updated: Sat Apr 03 2021 15:50:17 GMT+0000 (UTC)

published: Sat Apr 03 2021 15:50:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト