Video Transformer for Deepfake Detection with Incremental Learning

Sohail A. Khan; Hang Dai

インクリメンタル学習によるディープフェイク検出用のビデオトランスフォーマー

ディープフェイクによる顔の偽造はインターネット上で広く普及しており、これは深刻な社会的懸念を引き起こします。この論文では、ディープフェイクビデオを検出するためのインクリメンタル学習を備えた新しいビデオトランスフォーマーを提案します。入力顔画像をより適切に位置合わせするために、3D顔再構成法を使用して、単一の入力顔画像からUVテクスチャを生成します。位置合わせされた顔画像は、UVテクスチャ画像では認識できないポーズ、瞬き、口の動きの情報も提供できるため、顔画像とそのUVテクスチャマップの両方を使用して画像の特徴を抽出します。提案されたモデルを少量のデータで微調整し、より優れたディープフェイク検出パフォーマンスを実現するためのインクリメンタル学習戦略を紹介します。さまざまな公開ディープフェイクデータセットでの包括的な実験は、インクリメンタル学習を備えた提案されたビデオトランスフォーマーモデルが、シーケンスされたデータからの強化された特徴学習を備えたディープフェイクビデオ検出タスクで最先端のパフォーマンスを達成することを示しています。

Face forgery by deepfake is widely spread over the internet and this raises severe societal concerns. In this paper, we propose a novel video transformer with incremental learning for detecting deepfake videos. To better align the input face images, we use a 3D face reconstruction method to generate UV texture from a single input face image. The aligned face image can also provide pose, eyes blink and mouth movement information that cannot be perceived in the UV texture image, so we use both face images and their UV texture maps to extract the image features. We present an incremental learning strategy to fine-tune the proposed model on a smaller amount of data and achieve better deepfake detection performance. The comprehensive experiments on various public deepfake datasets demonstrate that the proposed video transformer model with incremental learning achieves state-of-the-art performance in the deepfake video detection task with enhanced feature learning from the sequenced data.

updated: Wed Aug 11 2021 16:22:56 GMT+0000 (UTC)

published: Wed Aug 11 2021 16:22:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト