VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living

Srijan Das; Rui Dai; Di Yang; Francois Bremond

VPN ++：ビデオの再考-日常生活動作を理解するための埋め込みのポーズ

日常生活動作（ADL）を認識するために、RGBポーズと3Dポーズを組み合わせるために多くの試みがなされてきました。 ADLは非常によく似ているように見える場合があり、それらを区別するために詳細な詳細をモデル化する必要があることがよくあります。最近の3DConvNetは硬すぎて、アクション全体の微妙な視覚パターンをキャプチャできないため、この研究の方向性は、RGBと3Dポーズを組み合わせた方法によって支配されています。ただし、適切なセンサーがない場合、RGBストリームから3Dポーズを計算するコストは高くなります。これにより、低遅延を必要とする実際のアプリケーションでの前述のアプローチの使用が制限されます。それでは、ADLを認識するために3Dポーズを最大限に活用する方法は？この目的のために、ポーズ駆動型注意メカニズムの拡張を提案します。ビデオポーズネットワーク（VPN）で、2つの異なる方向を探ります。 1つは、機能レベルの蒸留によってポーズの知識をRGBに転送することであり、もう1つは、注意レベルの蒸留によってポーズ主導の注意を模倣することです。最後に、これら2つのアプローチは、VPN ++と呼ばれる単一のモデルに統合されています。 VPN ++が効果的であるだけでなく、ノイズの多いポーズに高速化と高い復元力を提供することを示します。 VPN ++は、3Dポーズの有無にかかわらず、4つの公開データセットの代表的なベースラインを上回っています。コードはhttps://github.com/srijandas07/vpnplusplusで入手できます。

Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN++. We show that VPN++ is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN++, with or without 3D Poses, outperforms the representative baselines on 4 public datasets. Code is available at https://github.com/srijandas07/vpnplusplus.

updated: Mon May 17 2021 20:19:47 GMT+0000 (UTC)

published: Mon May 17 2021 20:19:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト