FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

Yan Wang; Yixuan Sun; Yiwen Huang; Zhongying Liu; Shuyong Gao; Wei Zhang; Weifeng Ge; Wenqiang Zhang

FERV39k：ビデオでの顔の表情認識のための大規模なマルチシーンデータセット

顔の表情認識（FER）の現在のベンチマークは主に静止画像に焦点を当てていますが、ビデオ内のFERのデータセットは限られています。既存のメソッドのパフォーマンスが実際のアプリケーション指向のシーンで満足のいくものであるかどうかを評価することはまだあいまいです。たとえば、トークショーでの強度の高い「ハッピー」表現は、オフィシャルイベントでの強度の低い同じ表現よりも識別力があります。このギャップを埋めるために、FERV39kとして造られた大規模なマルチシーンデータセットを構築します。このような新しいデータセットを構築するための重要な要素を、（1）マルチシーン階層と表現クラス、（2）候補ビデオクリップの生成、（3）信頼できる手動ラベル付けプロセスの3つの側面で分析します。これらのガイドラインに基づいて、22のシーンに分割された4つのシナリオを選択し、適切に設計されたワークフローに基づいて4kビデオから自動的に取得された86kサンプルに注釈を付け、最後に7つの古典的な表現でラベル付けされた38,935のビデオクリップを作成します。 4種類のベースラインフレームワークに関する実験ベンチマークも提供され、さまざまなシーンでのパフォーマンスに関するさらなる分析と、将来の研究のためのいくつかの課題が示されました。さらに、アブレーション研究によってDFERの主要コンポーネントを体系的に調査します。ベースラインフレームワークと私たちのプロジェクトはURLで入手できます。

Current benchmarks for facial expression recognition (FER) mainly focus on static images, while there are limited datasets for FER in videos. It is still ambiguous to evaluate whether performances of existing methods remain satisfactory in real-world application-oriented scenes. For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression with low intensity in Official-Event. To fill this gap, we build a large-scale multi-scene dataset, coined as FERV39k. We analyze the important ingredients of constructing such a novel dataset in three aspects: (1) multi-scene hierarchy and expression class, (2) generation of candidate video clips, (3) trusted manual labelling process. Based on these guidelines, we select 4 scenarios subdivided into 22 scenes, annotate 86k samples automatically obtained from 4k videos based on the well-designed workflow, and finally build 38,935 video clips labeled with 7 classic expressions. Experiment benchmarks on four kinds of baseline frameworks were also provided and further analysis on their performance across different scenes and some challenges for future research were given. Besides, we systematically investigate key components of DFER by ablation studies. The baseline framework and our project are available on url.

updated: Thu Mar 17 2022 17:25:33 GMT+0000 (UTC)

published: Thu Mar 17 2022 17:25:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト