Joint Learning of Deep Texture and High-Frequency Features for Computer-Generated Image Detection

Qiang Xu; Shan Jia; Xinghao Jiang; Tanfeng Sun; Zhe Wang; Hong Yan

コンピューター生成画像検出のためのディープテクスチャと高周波特徴の共同学習

コンピュータ生成 (CG) 画像と自然写真 (PG) 画像を区別することは、デジタル画像の信頼性と独創性を検証する上で非常に重要です。しかし、最近の最先端の生成方法により、CG 画像の高品質な合成が可能になり、この困難な作業はさらに難しくなっています。この問題に対処するために、CG 画像検出のための深いテクスチャと高周波の特徴を備えた共同学習戦略が提案されています。まず、CG 画像と PG 画像の異なる取得プロセスを定式化し、深く分析します。画像取得における複数の異なるモジュールが、画像の畳み込みニューラルネットワーク (CNN) ベースのレンダリングに異なる感度の不一致をもたらすという発見に基づいて、テクスチャの違いを強調し、識別可能なテクスチャを表現するためのディープテクスチャレンダリングモジュールを提案します。特に、セマンティックセグメンテーションマップは、入力画像のさまざまな領域のテクスチャを復元するために使用されるアフィン変換操作をガイドするために生成されます。次に、元の画像と、元の画像とレンダリングされた画像の高周波成分の組み合わせが、アテンションメカニズムを備えたマルチブランチニューラルネットワークに供給されます。これにより、中間の特徴が洗練され、それぞれ空間次元とチャネル次元でのトレース探索が容易になります。 2 つの公開データセットと、より現実的で多様な画像を含む新しく構築されたデータセットに関する広範な実験により、提案されたアプローチが現場の既存の方法よりも明らかに優れていることが示されています。その上、結果は、後処理操作と生成的敵対的ネットワーク (GAN) によって生成された画像に対する提案されたアプローチの検出の堅牢性と一般化能力も示しています。

Distinguishing between computer-generated (CG) and natural photographic (PG) images is of great importance to verify the authenticity and originality of digital images. However, the recent cutting-edge generation methods enable high qualities of synthesis in CG images, which makes this challenging task even trickier. To address this issue, a joint learning strategy with deep texture and high-frequency features for CG image detection is proposed. We first formulate and deeply analyze the different acquisition processes of CG and PG images. Based on the finding that multiple different modules in image acquisition will lead to different sensitivity inconsistencies to the convolutional neural network (CNN)-based rendering in images, we propose a deep texture rendering module for texture difference enhancement and discriminative texture representation. Specifically, the semantic segmentation map is generated to guide the affine transformation operation, which is used to recover the texture in different regions of the input image. Then, the combination of the original image and the high-frequency components of the original and rendered images are fed into a multi-branch neural network equipped with attention mechanisms, which refines intermediate features and facilitates trace exploration in spatial and channel dimensions respectively. Extensive experiments on two public datasets and a newly constructed dataset with more realistic and diverse images show that the proposed approach outperforms existing methods in the field by a clear margin. Besides, results also demonstrate the detection robustness and generalization ability of the proposed approach to postprocessing operations and generative adversarial network (GAN) generated images.

updated: Wed Sep 07 2022 17:30:40 GMT+0000 (UTC)

published: Wed Sep 07 2022 17:30:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト