Video Manipulations Beyond Faces: A Dataset with Human-Machine Analysis

Trisha Mittal; Ritwik Sinha; Viswanathan Swaminathan; John Collomosse; Dinesh Manocha

顔以外のビデオ操作: ヒューマンマシン分析によるデータセット

コンテンツ編集用のツールが成熟し、メディアを合成するための人工知能 (AI) ベースのアルゴリズムが成長するにつれて、オンラインメディア全体で操作されたコンテンツの存在が増加しています。この現象は誤った情報の拡散を引き起こし、「本物の」コンテンツと「操作された」コンテンツを区別する必要性が高まります。この目的のために、826 のビデオ (413 が実際、413 が操作) で構成されるデータセット、VideoSham を提示します。既存のディープフェイクデータセットの多くは、2 種類の顔の操作 (別の被験者の顔と交換するか、既存の顔を変更する) のみに焦点を当てています。一方、VideoSham には、6 つの異なる空間的および時間的攻撃の組み合わせを使用して操作された、より多様で、コンテキストが豊富で、人間中心の高解像度ビデオが含まれています。私たちの分析によると、最先端の改ざん検出アルゴリズムはいくつかの特定の攻撃に対してのみ機能し、VideoSham ではうまく拡張できません。 VideoSham で実際の動画と操作された動画を区別できるかどうかを理解するために、1200 人の参加者を対象に Amazon Mechanical Turk でユーザー調査を実施しました。最後に、人間と SOTA アルゴリズムによるパフォーマンスの長所と短所を深く掘り下げ、より優れた AI アルゴリズムで埋める必要があるギャップを特定します。 https://github.com/adobe-research/VideoSham-dataset でデータセットを提示します。

As tools for content editing mature, and artificial intelligence (AI) based algorithms for synthesizing media grow, the presence of manipulated content across online media is increasing. This phenomenon causes the spread of misinformation, creating a greater need to distinguish between ``real'' and ``manipulated'' content. To this end, we present VideoSham, a dataset consisting of 826 videos (413 real and 413 manipulated). Many of the existing deepfake datasets focus exclusively on two types of facial manipulations -- swapping with a different subject's face or altering the existing face. VideoSham, on the other hand, contains more diverse, context-rich, and human-centric, high-resolution videos manipulated using a combination of 6 different spatial and temporal attacks. Our analysis shows that state-of-the-art manipulation detection algorithms only work for a few specific attacks and do not scale well on VideoSham. We performed a user study on Amazon Mechanical Turk with 1200 participants to understand if they can differentiate between the real and manipulated videos in VideoSham. Finally, we dig deeper into the strengths and weaknesses of performances by humans and SOTA-algorithms to identify gaps that need to be filled with better AI algorithms. We present the dataset at https://github.com/adobe-research/VideoSham-dataset.

updated: Thu Dec 08 2022 01:47:06 GMT+0000 (UTC)

published: Tue Jul 26 2022 17:39:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト