Frame-rate Up-conversion Detection Based on Convolutional Neural Network for Learning Spatiotemporal Features

Minseok Yoon; Seung-Hun Nam; In-Jae Yu; Wonhyuk Ahn; Myung-Joon Kwon; Heung-Kyu Lee

時空間特徴を学習するための畳み込みニューラルネットワークに基づくフレームレートアップコンバージョン検出

ユーザーフレンドリーで強力なビデオ編集ツールの進歩により、誰でも目立つ視覚的な痕跡を残すことなくビデオを簡単に操作できます。代表的な時間領域操作であるフレームレートアップコンバージョン（FRUC）は、フレームレートの低いビデオのモーション連続性を高め、品質を向上させずに偽のフレームレートビデオを生成するなど、悪意のある偽造者がビデオの改ざんに使用します。または一時的にスプライスされたビデオをミキシングします。 FRUCはフレーム補間スキームに基づいており、補間されたフレームに残る微妙なアーティファクトを区別するのは難しいことがよくあります。したがって、このような偽造の痕跡を検出することは、ビデオフォレンジックにおける重要な問題です。この論文では、FRUCによって引き起こされるフォレンジック機能をエンドツーエンドで学習するフレームレート変換検出ネットワーク（FCDNet）を提案します。提案されたネットワークは、連続するフレームのスタックを入力として使用し、ネットワークブロックを使用して補間アーティファクトを効果的に学習し、時空間特徴を学習します。この研究は、FRUCの検出にニューラルネットワークを適用する最初の試みです。さらに、次の3種類のフレーム補間スキームをカバーできます：最近隣補間、双一次補間、および動き補償補間。すべてのフレームを利用して整合性を検証する既存の方法とは対照的に、提案されたアプローチは、その信頼性をテストするために6つのフレームのみを監視するため、高い検出速度を実現します。私たちの研究を検証するために、ビデオフォレンジックタスクのために従来のフォレンジック手法とニューラルネットワークを使用して広範な実験が行われました。提案されたネットワークは、FRUCの補間されたアーティファクトの検出に関して最先端のパフォーマンスを達成しました。実験結果は、トレーニングされたモデルが、見えないデータセット、学習されていないフレームレート、および学習されていない品質係数に対して堅牢であることも示しています。

With the advance in user-friendly and powerful video editing tools, anyone can easily manipulate videos without leaving prominent visual traces. Frame-rate up-conversion (FRUC), a representative temporal-domain operation, increases the motion continuity of videos with a lower frame-rate and is used by malicious counterfeiters in video tampering such as generating fake frame-rate video without improving the quality or mixing temporally spliced videos. FRUC is based on frame interpolation schemes and subtle artifacts that remain in interpolated frames are often difficult to distinguish. Hence, detecting such forgery traces is a critical issue in video forensics. This paper proposes a frame-rate conversion detection network (FCDNet) that learns forensic features caused by FRUC in an end-to-end fashion. The proposed network uses a stack of consecutive frames as the input and effectively learns interpolation artifacts using network blocks to learn spatiotemporal features. This study is the first attempt to apply a neural network to the detection of FRUC. Moreover, it can cover the following three types of frame interpolation schemes: nearest neighbor interpolation, bilinear interpolation, and motion-compensated interpolation. In contrast to existing methods that exploit all frames to verify integrity, the proposed approach achieves a high detection speed because it observes only six frames to test its authenticity. Extensive experiments were conducted with conventional forensic methods and neural networks for video forensic tasks to validate our research. The proposed network achieved state-of-the-art performance in terms of detecting the interpolated artifacts of FRUC. The experimental results also demonstrate that our trained model is robust for an unseen dataset, unlearned frame-rate, and unlearned quality factor.

updated: Thu Mar 25 2021 08:47:46 GMT+0000 (UTC)

published: Thu Mar 25 2021 08:47:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト