Deep DA for Ordinal Regression of Pain Intensity Estimation Using Weakly-Labeled Videos

Gnana Praveen R; Eric Granger; Patrick Cardinal

弱くラベル付けされたビデオを使用した疼痛強度推定の順序回帰のためのディープDA

ビデオの表情からの痛みの強さの自動推定は、ヘルスケアアプリケーションで計り知れない可能性を秘めています。ただし、ドメイン適応（DA）は、ソースドメインとターゲットドメインでキャプチャされたビデオデータ間で通常発生するドメインシフトの問題を軽減するために必要です。ビデオの収集と注釈付けの面倒な作業、および隣接する強度レベル間のあいまいさによる主観的なバイアスを考えると、このようなアプリケーションでは弱教師あり学習（WSL）が注目されています。しかし、ほとんどの最先端のWSLモデルは通常、回帰問題として定式化されており、強度レベル間の順序関係や、複数の連続するフレームの時間的コヒーレンスを活用していません。このホワイトペーパーでは、ターゲットドメインのビデオに粗いラベルが定期的に提供される、弱教師ありDAと順序回帰（WSDA-OR）の新しい深層学習モデルを紹介します。 WSDA-ORモデルは、ターゲットシーケンスに割り当てられた強度レベル間の順序関係を適用し、複数の関連するフレームを（単一のフレームではなく）シーケンスレベルのラベルに関連付けます。特に、複数のインスタンス学習を深い敵対的DAと統合することにより、判別式およびドメイン不変の特徴表現を学習します。ソフトガウスラベルは、ターゲットドメインからの弱い順序シーケンスレベルのラベルを効率的に表すために使用されます。提案されたアプローチは、完全にラベル付けされたソースドメインとしてRECOLAビデオデータセットで検証され、弱くラベル付けされたターゲットドメインとしてUNBC-McMasterビデオデータで検証されました。また、シーケンスレベルの推定のために、BIOVIDおよび疲労（プライベート）データセットでWSDA-ORを検証しました。実験結果は、私たちのアプローチが最先端のモデルよりも大幅に改善され、より高いローカリゼーション精度を達成できることを示しています。

Automatic estimation of pain intensity from facial expressions in videos has an immense potential in health care applications. However, domain adaptation (DA) is needed to alleviate the problem of domain shifts that typically occurs between video data captured in source and target do-mains. Given the laborious task of collecting and annotating videos, and the subjective bias due to ambiguity among adjacent intensity levels, weakly-supervised learning (WSL)is gaining attention in such applications. Yet, most state-of-the-art WSL models are typically formulated as regression problems, and do not leverage the ordinal relation between intensity levels, nor the temporal coherence of multiple consecutive frames. This paper introduces a new deep learn-ing model for weakly-supervised DA with ordinal regression(WSDA-OR), where videos in target domain have coarse la-bels provided on a periodic basis. The WSDA-OR model enforces ordinal relationships among the intensity levels as-signed to the target sequences, and associates multiple relevant frames to sequence-level labels (instead of a single frame). In particular, it learns discriminant and domain-invariant feature representations by integrating multiple in-stance learning with deep adversarial DA, where soft Gaussian labels are used to efficiently represent the weak ordinal sequence-level labels from the target domain. The proposed approach was validated on the RECOLA video dataset as fully-labeled source domain, and UNBC-McMaster video data as weakly-labeled target domain. We have also validated WSDA-OR on BIOVID and Fatigue (private) datasets for sequence level estimation. Experimental results indicate that our approach can provide a significant improvement over the state-of-the-art models, allowing to achieve a greater localization accuracy.

updated: Tue Nov 09 2021 16:04:43 GMT+0000 (UTC)

published: Wed Oct 28 2020 03:20:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト