Zhixi Cai; Shreya Ghosh; Abhinav Dhall; Tom Gedeon; Kalin Stefanov; Munawar Hayat

「Glitch in the Matrix!」: コンテンツ主導のオーディオビジュアル偽造の検出とローカリゼーションのための大規模なベンチマーク

"Glitch in the Matrix!": A Large Scale Benchmark for Content Driven Audio-Visual Forgery Detection and Localization

ほとんどのディープフェイク検出方法は、顔属性の空間的および/または時空間的変化の検出に焦点を当てています。これは、利用可能なベンチマークデータセットには、ほとんどがビジュアルのみの変更が含まれているためです。ただし、洗練されたディープフェイクには、コンテンツの意味を完全に変える可能性のあるオーディオまたはオーディオビジュアル操作の小さなセグメントが含まれる場合があります。このギャップに対処するために、戦略的なコンテンツ主導のオーディオ、ビジュアル、およびオーディオビジュアル操作で構成される新しいデータセット、Localized Audio Visual DeepFake (LAV-DF) を提案し、ベンチマークします。提案されたベースライン手法である Boundary Aware Temporal Forgery Detection (BA-TFD) は、マルチモーダル操作を効率的にキャプチャする 3D 畳み込みニューラルネットワークベースのアーキテクチャです。バックボーンをマルチスケールビジョントランスフォーマーに置き換えることでベースラインメソッドをさらに改善し (つまり、BA-TFD+)、対照的なフレーム分類、境界マッチング、およびマルチモーダル境界マッチング損失関数を使用してトレーニングプロセスをガイドします。定量分析は、新しく提案されたデータセットを含むいくつかのベンチマークデータセットを使用して、一時的な偽造のローカリゼーションおよびディープフェイク検出タスクに対する BA-TFD+ の優位性を示しています。データセット、モデル、およびコードは、https://github.com/ControlNet/LAV-DF で入手できます。

Most deepfake detection methods focus on detecting spatial and/or spatio-temporal changes in facial attributes. This is because available benchmark datasets contain mostly visual-only modifications. However, a sophisticated deepfake may include small segments of audio or audio-visual manipulations that can completely change the meaning of the content. To addresses this gap, we propose and benchmark a new dataset, Localized Audio Visual DeepFake (LAV-DF), consisting of strategic content-driven audio, visual and audio-visual manipulations. The proposed baseline method, Boundary Aware Temporal Forgery Detection (BA-TFD), is a 3D Convolutional Neural Network-based architecture which efficiently captures multimodal manipulations. We further improve (i.e. BA-TFD+) the baseline method by replacing the backbone with a Multiscale Vision Transformer and guide the training process with contrastive, frame classification, boundary matching and multimodal boundary matching loss functions. The quantitative analysis demonstrates the superiority of BA- TFD+ on temporal forgery localization and deepfake detection tasks using several benchmark datasets including our newly proposed dataset. The dataset, models and code are available at https://github.com/ControlNet/LAV-DF.

updated: Fri May 05 2023 05:33:57 GMT+0000 (UTC)

published: Wed May 03 2023 08:48:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト