Interpreting Depression From Question-wise Long-term Video Recording of SDS Evaluation

Wanqing Xie; Lizhong Liang; Yao Lu; Chen Wang; Jihong Shen; Hui Luo; Xiaofeng Liu

SDS評価の質問ごとの長期ビデオ録画からのうつ病の解釈

自己評価うつ病尺度（SDS）質問票は、効率的なうつ病の予備スクリーニングに頻繁に使用されています。ただし、制御不能な自己管理手段は、無愛想または欺瞞的に答え、臨床医が管理するハミルトンうつ病評価尺度（HDRS）と最終診断で異なる結果を生み出すことによって簡単に影響を受ける可能性があります。臨床的には、顔の表情（FE）と行動は、臨床医が管理する評価において重要な役割を果たしますが、FEと行動は、自己管理による評価では十分に検討されていません。この作業では、200の被験者の新しいデータセットを収集して、自己評価アンケートとそれに対応する質問ごとのビデオ録画の有効性を証明します。 SDS評価とペアビデオからうつ病を自動的に解釈するために、アンケート結果と回答時間も条件とする、長期可変長ビデオのエンドツーエンドの階層フレームワークを提案します。具体的には、ローカル時間パターン探索に3D CNNを利用し、質問ごとのグローバル特徴集約に冗長性を意識した自己注意（RAS）スキームを利用する階層モデルに頼ります。冗長な長期FEビデオ処理を対象として、RASは、質問セット内の各ビデオクリップの相関関係を効果的に活用して、識別情報を強調し、機能のペアごとの親和性に基づいて冗長性を排除します。次に、質問ごとのビデオ機能が、最終的なうつ病検出のための質問票スコアと連結されます。私たちの徹底的な評価はまた、SDS評価とそのビデオ録画を融合することの妥当性、および従来の最先端の時間モデリング手法に対する私たちのフレームワークの優位性を示しています。

Self-Rating Depression Scale (SDS) questionnaire has frequently been used for efficient depression preliminary screening. However, the uncontrollable self-administered measure can be easily affected by insouciantly or deceptively answering, and producing the different results with the clinician-administered Hamilton Depression Rating Scale (HDRS) and the final diagnosis. Clinically, facial expression (FE) and actions play a vital role in clinician-administered evaluation, while FE and action are underexplored for self-administered evaluations. In this work, we collect a novel dataset of 200 subjects to evidence the validity of self-rating questionnaires with their corresponding question-wise video recording. To automatically interpret depression from the SDS evaluation and the paired video, we propose an end-to-end hierarchical framework for the long-term variable-length video, which is also conditioned on the questionnaire results and the answering time. Specifically, we resort to a hierarchical model which utilizes a 3D CNN for local temporal pattern exploration and a redundancy-aware self-attention (RAS) scheme for question-wise global feature aggregation. Targeting for the redundant long-term FE video processing, our RAS is able to effectively exploit the correlations of each video clip within a question set to emphasize the discriminative information and eliminate the redundancy based on feature pair-wise affinity. Then, the question-wise video feature is concatenated with the questionnaire scores for final depression detection. Our thorough evaluations also show the validity of fusing SDS evaluation and its video recording, and the superiority of our framework to the conventional state-of-the-art temporal modeling methods.

updated: Fri Jun 25 2021 02:32:13 GMT+0000 (UTC)

published: Fri Jun 25 2021 02:32:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト