Omni-supervised Facial Expression Recognition via Distilled Data

Ping Liu; Yunchao Wei; Zibo Meng; Weihong Deng; Joey Tianyi Zhou; Yi Yang

蒸留データによるオムニ監視の顔の表情認識

顔の表情は、人間の感情を理解する上で重要な役割を果たします。最近では、深層学習ベースの方法が顔の表情の認識に有望であることが示されています。ただし、現在の最先端の顔の表情認識（FER）アプローチのパフォーマンスは、トレーニング用のラベル付きデータに直接関連しています。この問題を解決するために、以前の作業では、事前トレーニングと微調整の戦略を採用しています。つまり、大量のラベルなしデータを利用してネットワークを事前トレーニングし、ラベル付きデータで微調整します。ラベル付けされたデータは少量であるため、最終的なネットワークパフォーマンスは依然として制限されています。別の観点から、ネットワークトレーニングのために、ラベルのない大量のデータから信頼できるサンプルを直接活用するために、全教師あり学習を実行することを提案します。特に、新しいデータセットは、最初に、少数のラベル付きサンプルでトレーニングされたプリミティブモデルを使用して構築され、特徴ごとの類似性に基づいて、顔データセット、つまりMS-Celeb-1Mから信頼スコアの高いサンプルを選択します。このような全教師ありの方法で作成された新しいデータセットが、学習したFERモデルの一般化能力を大幅に向上させることができることを実験的に検証します。ただし、トレーニングサンプルの数が増えると、計算コストとトレーニング時間が大幅に増加します。これに取り組むために、データセット蒸留戦略を適用して、作成されたデータセットをいくつかの有益なクラスごとの画像に圧縮し、トレーニング効率を大幅に向上させることを提案します。広く使用されているベンチマークで広範な実験を実施しました。提案されたフレームワークを使用して、さまざまな設定で一貫したパフォーマンスの向上を実現できます。さらに重要なことに、抽出されたデータセットは、ごくわずかな追加の計算コストでFERのパフォーマンスを向上させる機能を示しています。

Facial expression plays an important role in understanding human emotions. Most recently, deep learning based methods have shown promising for facial expression recognition. However, the performance of the current state-of-the-art facial expression recognition (FER) approaches is directly related to the labeled data for training. To solve this issue, prior works employ the pretrain-and-finetune strategy, i.e., utilize a large amount of unlabeled data to pretrain the network and then finetune it by the labeled data. As the labeled data is in a small amount, the final network performance is still restricted. From a different perspective, we propose to perform omni-supervised learning to directly exploit reliable samples in a large amount of unlabeled data for network training. Particularly, a new dataset is firstly constructed using a primitive model trained on a small number of labeled samples to select samples with high confidence scores from a face dataset, i.e., MS-Celeb-1M, based on feature-wise similarity. We experimentally verify that the new dataset created in such an omni-supervised manner can significantly improve the generalization ability of the learned FER model. However, as the number of training samples grows, computational cost and training time increase dramatically. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images, significantly improving the training efficiency. We have conducted extensive experiments on widely used benchmarks, where consistent performance gains can be achieved under various settings using the proposed framework. More importantly, the distilled dataset has shown its capabilities of boosting the performance of FER with negligible additional computational costs.

updated: Thu Dec 09 2021 04:07:04 GMT+0000 (UTC)

published: Mon May 18 2020 09:36:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト