Progressive Cross-modal Knowledge Distillation for Human Action Recognition

Jianyuan Ni; Anne H. H. Ngu; Yan Yan

人間の行動認識のための漸進的クロスモーダル知識蒸留

ウェアラブルセンサーベースの人間の行動認識 (HAR) は、最近目覚ましい成功を収めています。ただし、ウェアラブルセンサーベースの HAR の精度パフォーマンスは、視覚モダリティベースのシステム (つまり、RGB ビデオ、スケルトン、および深度) の精度パフォーマンスよりもはるかに遅れています。多様な入力モダリティは補完的な手がかりを提供し、HAR の精度性能を向上させることができますが、ウェアラブルセンサーベースの HAR でマルチモーダルデータを活用する方法はほとんど検討されていません。現在、スマートウォッチなどのウェアラブルデバイスは、限られた種類の非視覚モダリティデータしか取得できません。これは、視覚的モダリティデータと非視覚的モダリティデータの両方を同時に使用できないため、マルチモーダル HAR 関連付けを妨げます。もう 1 つの大きな課題は、計算リソースが限られているウェアラブルデバイスでマルチモーダルデータを効率的に利用する方法にあります。この作業では、ウェアラブルセンサーベースの HAR 問題を解決するために、スマートウォッチからの時系列データ、つまり加速度計データのみを利用する新しいプログレッシブスケルトンからセンサーへの知識抽出 (PSKD) モデルを提案します。具体的には、教師 (人間の骨格シーケンス) と生徒 (時系列の加速度計データ) の両方のモダリティからのデータを使用して、複数の教師モデルを構築します。さらに、教師モデルと生徒モデルの間のパフォーマンスのギャップを解消する効果的な漸進的学習スキームを提案します。また、Adaptive-Confidence Semantic (ACS) と呼ばれる新しい損失関数を設計し、生徒モデルが教師モデルのいずれか、または模倣する必要があるグラウンドトゥルースラベルのいずれかを適応的に選択できるようにしました。提案した PSKD 手法の有効性を実証するために、Berkeley-MHAD、UTD-MHAD、および MMAct データセットに対して広範な実験を行います。結果は、提案された PSKD メソッドが以前のモノセンサーベースの HAR メソッドと比較して競争力のあるパフォーマンスを持っていることを確認します。

Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton, and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multimodal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.

updated: Wed Aug 17 2022 06:06:03 GMT+0000 (UTC)

published: Wed Aug 17 2022 06:06:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト