Voice-Face Homogeneity Tells Deepfake

Harry Cheng; Yangyang Guo; Tianyi Wang; Qi Li; Xiaojun Chang; Liqiang Nie

声の顔の均一性がディープフェイクに伝えます

ディープフェイクが悪用されているため、偽造ビデオを検出することは非常に望ましいことです。既存の検出アプローチは、ディープフェイクビデオの特定のアーティファクトの調査に貢献し、特定のデータにうまく適合します。ただし、これらのアーティファクトの成長技術は、従来のディープフェイク検出器の堅牢性に挑戦し続けています。その結果、これらのアプローチの一般化可能性の開発は妨げになっています。この問題に対処するために、ディープフェイクのビデオでは声と顔の背後にあるアイデンティティがしばしば不一致であり、声と顔がある程度均質であるという経験的結果を考慮して、この論文では、未踏の声からディープフェイクの検出を実行することを提案します-顔のマッチングビュー。この目的のために、これら2つの一致度を測定するために声と顔の一致方法が考案されています。それにもかかわらず、特定のディープフェイクデータセットのトレーニングにより、モデルはディープフェイクアルゴリズムの特定の特性に適合しなくなります。代わりに、事前トレーニングと微調整のパラダイムを使用して、未開発の偽造に迅速に適応する方法を提唱します。具体的には、最初に一般的な視聴覚データセットでモデルを事前トレーニングし、次にダウンストリームのディープフェイクデータで微調整します。広く利用されている3つのディープフェイクデータセット（DFDC、FakeAVCeleb、DeepfakeTIMIT）で広範な実験を行います。私たちの方法は、他の最先端の競合他社と比較して、大幅なパフォーマンスの向上を実現します。また、限られたディープフェイクデータを微調整した場合、私たちの方法はすでに競争力のある結果を達成していることも注目に値します。

Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of generalizability of these approaches has reached a blockage. To address this issue, given the empirical results that the identities behind voices and faces are often mismatched in deepfake videos, and the voices and faces have homogeneity to some extent, in this paper, we propose to perform the deepfake detection from an unexplored voice-face matching view. To this end, a voice-face matching method is devised to measure the matching degree of these two. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead, advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets - DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.

updated: Mon Jun 13 2022 06:49:17 GMT+0000 (UTC)

published: Fri Mar 04 2022 09:08:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト