3D-CNN for Facial Micro- and Macro-expression Spotting on Long Video Sequences using Temporal Oriented Reference Frame

Chuin Hong Yap; Moi Hoon Yap; Adrian K. Davison; Ryan Cunningham

時間指向の参照フレームを使用した長いビデオシーケンスでの顔のミクロおよびマクロ表現スポッティングのための3D-CNN

表情のスポッティングは、ミクロおよびマクロ表現分析の準備段階です。ビデオシーケンスでそのような表現を確実に見つけるタスクは現在解決されていません。現在の最良のシステムは、その動きを特定のクラスの顔の動きに分類する前に、オプティカルフロー法に依存して局所的な動きの特徴を抽出します。オプティカルフローはドリフトエラーの影響を受けやすく、フレームレートの高いマクロ式など、長期的な依存関係を持つモーションに深刻な問題を引き起こします。フレームの差動運動を追跡するのではなく、畳み込みモデルを介して各フレームを2つの時間的にローカルな参照フレームと比較する純粋な深層学習ソリューションを提案します。参照フレームは、計算されたミクロおよびマクロの発現時間に従ってサンプリングされます。私たちのソリューションは、高フレームレート（200 fps）の長いビデオシーケンス（SAMM-LV）のデータセットで最先端のパフォーマンス（F1スコア0.126）を達成し、低フレームレートで競争力があることを示しています。（30 fps）データセット（CAS（ME）2）。このホワイトペーパーでは、最適な結果を得るために重要であることを示すローカルコントラスト正規化の使用方法など、ディープラーニングモデルとパラメーターについて説明します。既存の手法の限界を超え、表情スポッティングの領域でディープラーニングの状態を進めます。

Facial expression spotting is the preliminary step for micro- and macro-expression analysis. The task of reliably spotting such expressions in video sequences is currently unsolved. The current best systems depend upon optical flow methods to extract regional motion features, before categorisation of that motion into a specific class of facial movement. Optical flow is susceptible to drift error, which introduces a serious problem for motions with long-term dependencies, such as high frame-rate macro-expression. We propose a purely deep learning solution which, rather than track frame differential motion, compares via a convolutional model, each frame with two temporally local reference frames. Reference frames are sampled according to calculated micro- and macro-expression durations. We show that our solution achieves state-of-the-art performance (F1-score of 0.126) in a dataset of high frame-rate (200 fps) long video sequences (SAMM-LV) and is competitive in a low frame-rate (30 fps) dataset (CAS(ME)2). In this paper, we document our deep learning model and parameters, including how we use local contrast normalisation, which we show is critical for optimal results. We surpass a limitation in existing methods, and advance the state of deep learning in the domain of facial expression spotting.

updated: Thu Jun 10 2021 12:39:31 GMT+0000 (UTC)

published: Thu May 13 2021 14:55:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト