Bridging Global Context Interactions for High-Fidelity Image Completion

Chuanxia Zheng; Tat-Jen Cham; Jianfei Cai; Dinh Phung

忠実度の高い画像完成のためのグローバルコンテキスト相互作用の橋渡し

グローバルコンテキストの相互作用を正しくブリッジすることは、大きなマスクを使用した忠実度の高い画像の完成にとって重要です。深いまたは大きな受容野（RF）畳み込みを介してこれを試みる以前の方法は、劣っている可能性がある近くの相互作用の優位性から逃れることはできません。この論文では、画像の完成を方向性のないシーケンス間予測タスクとして扱い、トランスフォーマーを展開してエンコーダーの長距離依存性を直接キャプチャすることを提案します。重要なのは、重み付きトークン表現に小さくて重複しないRFを備えた制限CNNを採用することです。これにより、トランスフォーマーは、より大きなRFが使用されたときに隣接するトークンを暗黙的に混乱させることなく、すべてのレイヤーで同じ重要度を持つ長距離の可視コンテキスト関係を明示的にモデル化できます。。可視領域と生成領域の間の外観の一貫性を向上させるために、新しい注意認識層（AAL）が導入され、遠方に関連する高周波機能をより有効に活用します。全体として、広範な実験は、いくつかのデータセットでの最先端の方法と比較して優れたパフォーマンスを示しています。

Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range dependence in the encoder. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets.

updated: Mon Nov 22 2021 07:46:56 GMT+0000 (UTC)

published: Fri Apr 02 2021 01:42:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト