Text-Guided Neural Image Inpainting

Lisai Zhang; Qingcai Chen; Baotian Hu; Shuoran Jiang

テキストガイドによるニューラル画像の修復

画像修復タスクでは、破損した画像をコンテキストと整合性のあるコンテンツで埋める必要があります。この研究分野は、ニューラルイメージインペインティング手法を使用して有望な進歩を遂げています。それにもかかわらず、コンテキストピクセルのみで見逃したコンテンツを推測することには、依然として重大な課題があります。このペーパーの目的は、提供された説明テキストに従って、破損した画像に意味情報を入力することです。既存のテキスト誘導画像生成作業とは異なり、修復モデルは、指定されたテキストの意味内容と画像の残りの部分を比較し、不足している部分を埋める意味内容を見つける必要があります。このようなタスクを実行するために、テキスト誘導デュアルアテンションインペインティングネットワーク（TDANet）という新しいインペインティングモデルを提案します。最初に、二重マルチモーダル注意メカニズムが、破損した領域に関する明示的な意味情報を抽出するように設計されています。これは、相互注意を通じて説明テキストと補足画像領域を比較することによって行われます。次に、生成された画像とテキストの意味的類似性を最大化するために、画像とテキストのマッチング損失が適用されます。実験は、2つのオープンデータセットに対して行われます。結果は、提案されたTDANetモデルが量的および質的測定の両方で新しい最先端技術に到達することを示しています。結果分析は、生成された画像がガイダンステキストと一致していることを示唆しており、異なる説明を提供することにより、さまざまな結果を生成できるようにしています。コードはhttps://github.com/idealwhite/TDANetで入手できます

Image inpainting task requires filling the corrupted image with contents coherent with the context. This research field has achieved promising progress by using neural image inpainting methods. Nevertheless, there is still a critical challenge in guessing the missed content with only the context pixels. The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text. Unique from existing text-guided image generation works, the inpainting models are required to compare the semantic content of the given text and the remaining part of the image, then find out the semantic content that should be filled for missing part. To fulfill such a task, we propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet). Firstly, a dual multimodal attention mechanism is designed to extract the explicit semantic information about the corrupted regions, which is done by comparing the descriptive text and complementary image areas through reciprocal attention. Secondly, an image-text matching loss is applied to maximize the semantic similarity of the generated image and the text. Experiments are conducted on two open datasets. Results show that the proposed TDANet model reaches new state-of-the-art on both quantitative and qualitative measures. Result analysis suggests that the generated images are consistent with the guidance text, enabling the generation of various results by providing different descriptions. Codes are available at https://github.com/idealwhite/TDANet

updated: Mon Mar 22 2021 08:12:02 GMT+0000 (UTC)

published: Tue Apr 07 2020 09:04:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト