Deepfake Detection via Joint Unsupervised Reconstruction and Supervised Classification

Bosheng Yan; Chang-Tsun Li; Xuequan Lu

教師なし再構成と教師あり分類の併用によるディープフェイク検出

ディープラーニングは、リアルな顔の操作 (つまり、ディープフェイク) を可能にしました。これは、流通しているメディアの完全性に重大な懸念をもたらします。ディープフェイク検出のための既存の深層学習技術のほとんどは、データセット内評価設定 (つまり、同じデータセットでのトレーニングとテスト) で有望なパフォーマンスを達成できますが、データセット間評価設定 (つまり、1 つのデータセットでのトレーニング) では十分に実行できません。データセットと別のテスト)。以前の方法のほとんどは、バックボーンネットワークを使用して予測を行うためのグローバルな特徴を抽出し、バイナリ監視 (つまり、トレーニングインスタンスが偽物か本物かを示す) のみを使用してネットワークをトレーニングします。グローバルな特徴の学習のみに基づく分類は、多くの場合、目に見えない操作方法への一般化の可能性を弱めます。さらに、再構成タスクは、学習した表現を改善できます。この論文では、これらの問題に対処するために、再構成と分類のタスクを同時に考慮する、ディープフェイク検出のための新しいアプローチを紹介します。この方法は、あるタスクによって学習された情報を他のタスクと共有します。これは、他の既存の作品ではめったに考慮されない別の側面に焦点を当てているため、全体的なパフォーマンスが向上します。特に、特徴マップを潜在表現に圧縮するために使用される畳み込みエンコーダーが両方のブランチで共有される、2 ブランチの畳み込みオートエンコーダー (CAE) を設計します。次に、入力データの潜在表現は、単純な分類器と教師なし再構成コンポーネントに同時に供給されます。当社のネットワークはエンドツーエンドでトレーニングされています。実験は、特にクロスデータセット評価設定で、3 つの一般的に使用されるデータセットで最先端のパフォーマンスを達成することを示しています。

Deep learning has enabled realistic face manipulation (i.e., deepfake), which poses significant concerns over the integrity of the media in circulation. Most existing deep learning techniques for deepfake detection can achieve promising performance in the intra-dataset evaluation setting (i.e., training and testing on the same dataset), but are unable to perform satisfactorily in the inter-dataset evaluation setting (i.e., training on one dataset and testing on another). Most of the previous methods use the backbone network to extract global features for making predictions and only employ binary supervision (i.e., indicating whether the training instances are fake or authentic) to train the network. Classification merely based on the learning of global features leads often leads to weak generalizability to unseen manipulation methods. In addition, the reconstruction task can improve the learned representations. In this paper, we introduce a novel approach for deepfake detection, which considers the reconstruction and classification tasks simultaneously to address these problems. This method shares the information learned by one task with the other, which focuses on a different aspect other existing works rarely consider and hence boosts the overall performance. In particular, we design a two-branch Convolutional AutoEncoder (CAE), in which the Convolutional Encoder used to compress the feature map into the latent representation is shared by both branches. Then the latent representation of the input data is fed to a simple classifier and the unsupervised reconstruction component simultaneously. Our network is trained end-to-end. Experiments demonstrate that our method achieves state-of-the-art performance on three commonly-used datasets, particularly in the cross-dataset evaluation setting.

updated: Fri Dec 23 2022 07:02:41 GMT+0000 (UTC)

published: Thu Nov 24 2022 05:44:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト