Omni-supervised Facial Expression Recognition via Distilled Data

Ping Liu; Yunchao Wei; Zibo Meng; Weihong Deng; Joey Tianyi Zhou; Yi Yang

蒸留データによる全教師ありの顔の表情認識

本論文では、全教師あり学習を活用して、表情認識（FER）のパフォーマンスを向上させることを目標としています。現在の最先端のFERアプローチは、通常、限られた数のサンプルでモデルをトレーニングすることにより、制御された環境で顔の表情を認識することを目的としています。さまざまなシナリオで学習したモデルの堅牢性を強化するために、ラベル付きサンプルと多数のラベルなしデータを活用して、全教師あり学習を実行することを提案します。特に、最初にMS-Celeb-1Mを顔のプールとして採用し、約5,822Kのラベルのない顔画像が含まれています。次に、少数のラベル付きサンプルで学習されたプリミティブモデルを採用して、特徴ベースの類似性比較を実行することにより、顔のプールから信頼性の高いサンプルを選択します。このような全教師ありの方法で構築された新しいデータセットは、学習したFERモデルの一般化能力を大幅に向上させ、結果としてパフォーマンスを向上させることができることがわかりました。ただし、より多くのトレーニングサンプルが使用されると、より多くの計算リソースとトレーニング時間が必要になります。これは通常、多くの状況で手頃な価格ではありません。計算リソースの要件を緩和するために、データセット蒸留戦略をさらに採用して、新しいマイニングされたサンプルからターゲットタスク関連の知識を抽出し、それらを非常に小さな画像セットに圧縮します。この蒸留されたデータセットは、追加の計算コストをほとんど導入することなく、FERのパフォーマンスを向上させることができます。 5つの人気のあるベンチマークと新しく構築されたデータセットで広範な実験を実行します。提案されたフレームワークを使用して、さまざまな設定で一貫したゲインを達成できます。この作業が確実なベースラインとして機能し、FERでの将来の研究を容易にするのに役立つことを願っています。

In this paper, we target on advancing the performance in facial expression recognition (FER) by exploiting omni-supervised learning. The current state of the art FER approaches usually aim to recognize facial expressions in a controlled environment by training models with a limited number of samples. To enhance the robustness of the learned models for various scenarios, we propose to perform omni-supervised learning by exploiting the labeled samples together with a large number of unlabeled data. Particularly, we first employ MS-Celeb-1M as the facial-pool where around 5,822K unlabeled facial images are included. Then, a primitive model learned on a small number of labeled samples is adopted to select samples with high confidence from the facial-pool by conducting feature-based similarity comparison. We find the new dataset constructed in such an omni-supervised manner can significantly improve the generalization ability of the learned FER model and boost the performance consequently. However, as more training samples are used, more computation resources and training time are required, which is usually not affordable in many circumstances. To relieve the requirement of computational resources, we further adopt a dataset distillation strategy to distill the target task-related knowledge from the new mined samples and compressed them into a very small set of images. This distilled dataset is capable of boosting the performance of FER with few additional computational cost introduced. We perform extensive experiments on five popular benchmarks and a newly constructed dataset, where consistent gains can be achieved under various settings using the proposed framework. We hope this work will serve as a solid baseline and help ease future research in FER.

updated: Tue Nov 30 2021 06:28:08 GMT+0000 (UTC)

published: Mon May 18 2020 09:36:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト