Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in VIS and NIR Scenario

Yukai Wang; Chunlei Peng; Decheng Liu; Nannan Wang; Xinbo Gao

VISおよびNIRシナリオでのビデオ偽造検出のための時空間周波数偽造の手がかり

近年、顔の編集と生成の急速な発展に伴い、ソーシャルメディア上でますます多くの偽のビデオが流通しており、それが極端な世論の懸念を引き起こしています。周波数領域に基づく既存の顔偽造検出方法では、GAN偽造画像には、実際の画像と比較して、周波数スペクトルに明らかなグリッドのような視覚的アーティファクトがあることがわかります。しかし、合成されたビデオの場合、これらの方法は単一のフレームに限定され、異なるフレーム間の最も識別力のある部分と時間的周波数の手がかりにはほとんど注意を払いません。ビデオシーケンスの豊富な情報を最大限に活用するために、このペーパーでは、空間周波数ドメインと時間周波数ドメインの両方でビデオ偽造検出を実行し、離散コサイン変換ベースの偽造手がかり拡張ネットワーク（FCAN-DCT）を提案して、より包括的な時空間を実現します。特徴表現。 FCAN-DCTは、バックボーンネットワークと、Compact Feature Extraction（CFE）モジュールとFrequency Temporal Attention（FTA）モジュールの2つのブランチで構成されています。 2つの可視光（VIS）ベースのデータセットWildDeepfakeとCeleb-DF（v2）、および近赤外線モダリティに関する最初のビデオ偽造データセットである自作のビデオ偽造データセットDeepfakeNIRについて徹底的な実験的評価を実施します。実験結果は、VISとNIRの両方のシナリオで偽造ビデオを検出する上での私たちの方法の有効性を示しています。

In recent years, with the rapid development of face editing and generation, more and more fake videos are circulating on social media, which has caused extreme public concerns. Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images. But for synthesized videos, these methods only confine to single frame and pay little attention to the most discriminative part and temporal frequency clue among different frames. To take full advantage of the rich information in video sequences, this paper performs video forgery detection on both spatial and temporal frequency domains and proposes a Discrete Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation. FCAN-DCT consists of a backbone network and two branches: Compact Feature Extraction (CFE) module and Frequency Temporal Attention (FTA) module. We conduct thorough experimental assessments on two visible light (VIS) based datasets WildDeepfake and Celeb-DF (v2), and our self-built video forgery dataset DeepfakeNIR, which is the first video forgery dataset on near-infrared modality. The experimental results demonstrate the effectiveness of our method on detecting forgery videos in both VIS and NIR scenarios.

updated: Tue Jul 05 2022 09:27:53 GMT+0000 (UTC)

published: Tue Jul 05 2022 09:27:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト