Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

Juan Hu; Xin Liao; Difei Gao; Satoshi Tsutsui; Qian Wang; Zheng Qin; Mike Zheng Shou

要約: 顔の復元と復元された顔のマッピングによる、予測不可能な改ざん痕跡を持つディープフェイクビデオの検出

ディープフェイク技術を悪意のある目的で悪用することで、ディープフェイク検出に対する研究の関心が高まっています。ディープフェイク操作では、ランダムに改ざんされた痕跡が頻繁に導入され、顔のさまざまな領域で予測不可能な結果が生じます。しかし、既存の検出方法は特定の偽造痕跡に大きく依存しており、偽造モードが向上するにつれて、これらの痕跡はますますランダム化され、その結果、特定の偽造痕跡に依存する方法の検出性能が低下します。この制限に対処するために、私たちは、顔を復元することで不特定の顔部分の不一致を明らかにし、復元された顔をマッピングすることで本物と偽物の違いを拡大する、新しいディープフェイク検出モデルである Recap を提案します。復元段階では、モデルは対象領域 (ROI) をランダムにマスキングし、予測できない改ざん痕跡を残さずに本物の顔を再構成することに重点を置いているため、本物の顔については比較的良好な回復効果が得られますが、偽の顔については不十分な回復効果が得られます。マッピング段階では、回復フェーズの出力が、顔のマッピングプロセスをガイドするための監視として機能します。このマッピングプロセスでは、再現性の低い偽の顔のマッピングを戦略的に強調して、表現のさらなる低下につながる一方で、良好な表現を備えた本物の顔のマッピングを強化および洗練します。結果として、このアプローチでは、本物のビデオと偽のビデオの間の差異が大幅に拡大されます。標準ベンチマークに関する広範な実験により、Recap が複数のシナリオで効果的であることが実証されました。

The exploitation of Deepfake techniques for malicious intentions has driven significant research interest in Deepfake detection. Deepfake manipulations frequently introduce random tampered traces, leading to unpredictable outcomes in different facial regions. However, existing detection methods heavily rely on specific forgery indicators, and as the forgery mode improves, these traces become increasingly randomized, resulting in a decline in the detection performance of methods reliant on specific forgery traces. To address the limitation, we propose Recap, a novel Deepfake detection model that exposes unspecific facial part inconsistencies by recovering faces and enlarges the differences between real and fake by mapping recovered faces. In the recovering stage, the model focuses on randomly masking regions of interest (ROIs) and reconstructing real faces without unpredictable tampered traces, resulting in a relatively good recovery effect for real faces while a poor recovery effect for fake faces. In the mapping stage, the output of the recovery phase serves as supervision to guide the facial mapping process. This mapping process strategically emphasizes the mapping of fake faces with poor recovery, leading to a further deterioration in their representation, while enhancing and refining the mapping of real faces with good representation. As a result, this approach significantly amplifies the discrepancies between real and fake videos. Our extensive experiments on standard benchmarks demonstrate that Recap is effective in multiple scenarios.

updated: Sat Aug 19 2023 06:18:11 GMT+0000 (UTC)

published: Sat Aug 19 2023 06:18:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト