A Efficient Multimodal Framework for Large Scale Emotion Recognition by Fusing Music and Electrodermal Activity Signals

Guanghao Yin; Shouqian Sun; Dian Yu; Dejian Li; Kejun Zhang

音楽と皮膚電気活動信号を融合することによる大規模な感情認識のための効率的なマルチモーダルフレームワーク

感情コンピューティングの分野では、生理学的信号ベースの感情認識にかなりの注意が払われています。信頼性とユーザーフレンドリーな取得のために、Electrodermal Activity（EDA）は実際のアプリケーションで大きな利点があります。ただし、EDAベースの数百の被験者による感情認識には、効果的なソリューションがまだありません。この論文では、私たちの仕事は、主題の個々のEDA機能と外部誘発音楽機能を融合することを試みています。そして、エンドツーエンドのマルチモーダルフレームワークである1次元残差時間チャネルチャネルアテンションネットワーク（RTCAN-1D）を提案します。 EDA機能の場合、新規の凸最適化ベースのEDA（CvxEDA）メソッドを適用して、EDA信号を動的および定常機能をマイニングするための段階的およびトニック信号に分解します。 EDAベースの感情認識のチャネル時間的注意メカニズムは、時間的およびチャネル的表現を改善するために最初に関与します。音楽機能については、オープンソースツールキットopenSMILEで音楽信号を処理して、外部機能ベクトルを取得します。 EDA信号からの個々の感情の特徴と音楽からの外部の感情のベンチマークは、分類層で融合されます。 2つのクラスのバランス/覚醒感情認識のために、3つのマルチモーダルデータセット（PMEmo、DEAP、AMIGOS）で体系的な比較を行いました。私たちが提案するRTCAN-1Dは、既存の最先端モデルよりも優れており、これにより、私たちの作業が大規模な感情認識のための信頼性が高く効率的なソリューションを提供することも検証されます。私たちのコードはhttps://github.com/guanghaoyin/RTCAN-1Dでリリースされました。

Considerable attention has been paid for physiological signal-based emotion recognition in field of affective computing. For the reliability and user friendly acquisition, Electrodermal Activity (EDA) has great advantage in practical applications. However, the EDA-based emotion recognition with hundreds of subjects still lacks effective solution. In this paper, our work makes an attempt to fuse the subject individual EDA features and the external evoked music features. And we propose an end-to-end multimodal framework, the 1-dimensional residual temporal and channel attention network (RTCAN-1D). For EDA features, the novel convex optimization-based EDA (CvxEDA) method is applied to decompose EDA signals into pahsic and tonic signals for mining the dynamic and steady features. The channel-temporal attention mechanism for EDA-based emotion recognition is firstly involved to improve the temporal- and channel-wise representation. For music features, we process the music signal with the open source toolkit openSMILE to obtain external feature vectors. The individual emotion features from EDA signals and external emotion benchmarks from music are fused in the classifing layers. We have conducted systematic comparisons on three multimodal datasets (PMEmo, DEAP, AMIGOS) for 2-classes valance/arousal emotion recognition. Our proposed RTCAN-1D outperforms the existing state-of-the-art models, which also validate that our work provides an reliable and efficient solution for large scale emotion recognition. Our code has been released at https://github.com/guanghaoyin/RTCAN-1D.

updated: Thu Dec 02 2021 03:04:51 GMT+0000 (UTC)

published: Sat Aug 22 2020 03:13:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト