Video-based Remote Physiological Measurement via Self-supervised Learning

Zijie Yue; Miaojing Shi; Shuai Ding

自己教師あり学習によるビデオベースの遠隔生理学的測定

ビデオベースのリモート生理学的測定は、人間の顔ビデオからリモートフォトプレチスモグラフィ (rPPG) 信号を推定し、rPPG 信号から複数のバイタルサイン (心拍数、呼吸数など) を測定することを目的としています。最近のアプローチでは、ディープニューラルネットワークをトレーニングすることでこれを実現しています。これには、通常、大量の顔のビデオと、監視のために同期的に記録されたフォトプレチスモグラフィ (PPG) 信号が必要です。しかし、これらの注釈付きコーパスの収集は、実際には容易ではありません。この論文では、グラウンドトゥルースPPG信号を必要とせずに顔ビデオからrPPG信号を推定することを学習する、新しい周波数に触発された自己教師ありフレームワークを紹介します。ビデオサンプルが与えられると、最初に、元のサンプルと類似/非類似の信号周波数を含む複数のポジティブ/ネガティブサンプルに拡張します。具体的には、空間拡張を使用して陽性サンプルが生成されます。負のサンプルは、視覚的な外観を過度に変更することなく、入力に対して非線形信号周波数変換を実行する学習可能な周波数増強モジュールを介して生成されます。次に、ローカル rPPG エキスパートアグリゲーションモジュールを導入して、拡張サンプルから rPPG 信号を推定します。異なる顔領域からの補完的な拍動情報をエンコードし、それらを 1 つの rPPG 予測に集約します。最後に、複数の増強されたビデオサンプルおよび時間的に隣接するビデオサンプルから推定された rPPG 信号を最適化するために、一連の周波数に触発された損失、つまり、周波数コントラスト損失、周波数比一貫性損失、およびクロスビデオ周波数一致損失を提案します。 4 つの標準ベンチマークで、rPPG ベースの心拍数、心拍数の変動性、および呼吸数の推定を行います。実験結果は、私たちの方法が最先端技術を大幅に改善することを示しています。

Video-based remote physiological measurement aims to estimate remote photoplethysmography (rPPG) signals from human face videos and then measure multiple vital signs (e.g. heart rate, respiration frequency) from rPPG signals. Recent approaches achieve it by training deep neural networks, which normally require abundant face videos and synchronously recorded photoplethysmography (PPG) signals for supervision. However, the collection of these annotated corpora is uneasy in practice. In this paper, we introduce a novel frequency-inspired self-supervised framework that learns to estimate rPPG signals from face videos without the need of ground truth PPG signals. Given a video sample, we first augment it into multiple positive/negative samples which contain similar/dissimilar signal frequencies to the original one. Specifically, positive samples are generated using spatial augmentation. Negative samples are generated via a learnable frequency augmentation module, which performs non-linear signal frequency transformation on the input without excessively changing its visual appearance. Next, we introduce a local rPPG expert aggregation module to estimate rPPG signals from augmented samples. It encodes complementary pulsation information from different face regions and aggregate them into one rPPG prediction. Finally, we propose a series of frequency-inspired losses, i.e. frequency contrastive loss, frequency ratio consistency loss, and cross-video frequency agreement loss, for the optimization of estimated rPPG signals from multiple augmented video samples and across temporally neighboring video samples. We conduct rPPG-based heart rate, heart rate variability and respiration frequency estimation on four standard benchmarks. The experimental results demonstrate that our method improves the state of the art by a large margin.

updated: Fri Oct 28 2022 09:49:10 GMT+0000 (UTC)

published: Thu Oct 27 2022 13:03:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト