CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying

Weihuang Liu; Xiaodong Cun; Chi-Man Pun; Menghan Xia; Yong Zhang; Jue Wang

CoordFill: パラメータ化された座標クエリによる効率的な高解像度画像修復

画像の修復は、入力の欠けている穴を埋めることを目的としています。 2 つの理由により、高解像度画像に直面している場合、このタスクを効率的に解決することは困難です。(1) 高解像度画像の修復には、大きな受信フィールドを処理する必要があります。 (2) 一般的なエンコーダおよびデコーダネットワークは、画像マトリックスの形式により、多くの背景ピクセルを同期的に合成します。この論文では、最近の連続的な暗黙的表現の開発のおかげで、上記の制限を初めて打破しようとします。詳細には、劣化した画像をダウンサンプリングしてエンコードし、注意の高速フーリエ畳み込み (FFC) ベースのパラメーター生成ネットワークを介して、各空間パッチの空間適応パラメーターを生成します。次に、これらのパラメータを一連の多層パーセプトロン (MLP) の重みとバイアスとして取得します。ここで、入力はエンコードされた連続座標であり、出力は合成された色値です。提案された構造のおかげで、より大きな受信フィールドをキャプチャするために、高解像度画像のみを比較的低い解像度でエンコードします。次に、連続位置エンコーディングは、より高い解像度で座標を再サンプリングすることにより、フォトリアリスティックな高周波テクスチャを合成するのに役立ちます。また、私たちのフレームワークでは、欠落しているピクセルの座標を並列でのみ照会できるため、以前の方法よりも効率的なソリューションが得られます。実験は、提案された方法が単一の GTX 2080 Ti GPU を使用して 2048×2048 画像でリアルタイムパフォーマンスを達成し、4096×4096 画像を処理できることを示しています。これは、既存の最先端の方法よりも視覚的および数値的にはるかに優れたパフォーマンスです。コードは https://github.com/NiFangBaAGe/CoordFill で入手できます。

Image inpainting aims to fill the missing hole of the input. It is hard to solve this task efficiently when facing high-resolution images due to two reasons: (1) Large reception field needs to be handled for high-resolution image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously due to the form of the image matrix. In this paper, we try to break the above limitations for the first time thanks to the recent development of continuous implicit representation. In detail, we down-sample and encode the degraded image to produce the spatial-adaptive parameters for each spatial patch via an attentional Fast Fourier Convolution(FFC)-based parameter generation network. Then, we take these parameters as the weights and biases of a series of multi-layer perceptron(MLP), where the input is the encoded continuous coordinates and the output is the synthesized color value. Thanks to the proposed structure, we only encode the high-resolution image in a relatively low resolution for larger reception field capturing. Then, the continuous position encoding will be helpful to synthesize the photo-realistic high-frequency textures by re-sampling the coordinate in a higher resolution. Also, our framework enables us to query the coordinates of missing pixels only in parallel, yielding a more efficient solution than the previous methods. Experiments show that the proposed method achieves real-time performance on the 2048×2048 images using a single GTX 2080 Ti GPU and can handle 4096×4096 images, with much better performance than existing state-of-the-art methods visually and numerically. The code is available at: https://github.com/NiFangBaAGe/CoordFill.

updated: Wed Mar 15 2023 11:13:51 GMT+0000 (UTC)

published: Wed Mar 15 2023 11:13:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト