Recurrent Transformer Encoders for Vision-based Estimation of Fatigue and Engagement in Cognitive Training Sessions

Yanchen Wang; Yunlong Xu; Feng Vankee Lin; Ehsan Adeli

視覚に基づいた疲労と認知トレーニングセッションへの関与の推定のためのリカレントトランスエンコーダー

認知症における認知機能の低下と脳の老化を遅らせるコンピュータ化された認知トレーニングの有効性は、多くの場合、トレーニングへの参加者の関与によって制限されます。注意、動機、および感情の領域における年配のユーザーのリアルタイムの関与を監視することは、そのようなトレーニングの全体的な効果を理解するために重要です。このホワイトペーパーでは、軽度認知障害 (MCI) の高齢者を対象に、コンピューター化された認知トレーニングセッション全体でユーザーの知覚された注意、動機、および影響を評価する、確立された精神的疲労測定によって定量化されたエンゲージメントを、リアルタイムで監視することによって予測することを提案します。トレーニングセッションでのビデオ録画された顔のジェスチャー。この目標を達成するために、コンピュータービジョンを使用して、5 秒ごとにビデオフレームを分析し、情報の保持とデータサイズのバランスを最適化し、新しい Recurrent Video Transformer (RVT) を開発しました。クリップ単位のトランスフォーマーエンコーダーモジュールとセッション単位のリカレントニューラルネットワーク (RNN) 分類器を組み合わせた当社の RVT モデルは、他の最先端のモデルと比較して、最高のバランスのとれた精度、F1 スコア、および精度を達成しました。精神的疲労/離脱ケースの検出 (二項分類) および精神的疲労のレベルの評価 (多クラス分類)。動的な時間情報を活用することにより、RVT モデルは、コンピューター化された認知トレーニングユーザー間のエンゲージメントを正確に予測する可能性を示しています。これは、コンピューター化された認知トレーニング介入への関与のレベルを調整するための将来の作業の基礎を築きます。コードが公開されます。

The effectiveness of computerized cognitive training in slowing cognitive decline and brain aging in dementia is often limited by the engagement of participants in the training. Monitoring older users' real-time engagement in domains of attention, motivation, and affect is crucial to understanding the overall effectiveness of such training. In this paper, we propose to predict engagement, quantified via an established mental fatigue measure assessing users' perceived attention, motivation, and affect throughout computerized cognitive training sessions, in older adults with mild cognitive impairment (MCI), by monitoring their real-time video-recorded facial gestures in training sessions. To achieve the goal, we used computer vision, analyzing video frames every 5 seconds to optimize the balance between information retention and data size, and developed a novel Recurrent Video Transformer (RVT). Our RVT model, which combines a clip-wise transformer encoder module and a session-wise Recurrent Neural Network (RNN) classifier, achieved the highest balanced accuracy, F1 score, and precision compared to other state-of-the-art models for both detecting mental fatigue/disengagement cases (binary classification) and rating the level of mental fatigue (multi-class classification). By leveraging dynamic temporal information, the RVT model demonstrates the potential to accurately predict engagement among computerized cognitive training users, which lays the foundation for future work to modulate the level of engagement in computerized cognitive training interventions. The code will be released.

updated: Mon Apr 24 2023 21:58:14 GMT+0000 (UTC)

published: Mon Apr 24 2023 21:58:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト