Spatial-temporal Transformer-guided Diffusion based Data Augmentation for Efficient Skeleton-based Action Recognition

Yifan Jiang; Han Chen; Hanseok Ko

効率的なスケルトンベースのアクション認識のための時空間トランスフォーマー誘導拡散ベースのデータ拡張

人間の骨格のコンパクトな表現がこの研究領域に新しい血をもたらすため、最近、骨格に基づく人間の行動が注目の研究テーマになっています。その結果、研究者は、骨格情報を抽出することによって人間の行動を分析するために、RGB またはその他のセンサーを使用することの重要性に気付き始めました。ディープラーニング (DL) の急速な発展を利用して、最近、かなりの数のスケルトンベースの人間のアクションアプローチが、細かく設計された DL 構造で提示されています。ただし、十分にトレーニングされた DL モデルは常に高品質で十分なデータを必要とし、高額な費用と人的労力を費やさずに取得することは困難です。この論文では、高品質で多様な一連のアクションを効果的に生成できる、スケルトンベースのアクション認識タスク用の新しいデータ拡張方法を紹介します。自然で現実的なアクションシーケンスを取得するために、一連の合成アクションシーケンスを生成できるノイズ除去拡散確率モデル (DDPM) を提案し、その生成プロセスは時空間変換器 (ST-Trans) によって正確にガイドされます。実験結果は、私たちの方法がさまざまな自然性と多様性の指標で最先端の (SOTA) モーション生成アプローチよりも優れていることを示しています。その高品質な合成データは、既存のアクション認識モデルにも効果的に展開でき、パフォーマンスが大幅に向上することが証明されています。

Recently, skeleton-based human action has become a hot research topic because the compact representation of human skeletons brings new blood to this research domain. As a result, researchers began to notice the importance of using RGB or other sensors to analyze human action by extracting skeleton information. Leveraging the rapid development of deep learning (DL), a significant number of skeleton-based human action approaches have been presented with fine-designed DL structures recently. However, a well-trained DL model always demands high-quality and sufficient data, which is hard to obtain without costing high expenses and human labor. In this paper, we introduce a novel data augmentation method for skeleton-based action recognition tasks, which can effectively generate high-quality and diverse sequential actions. In order to obtain natural and realistic action sequences, we propose denoising diffusion probabilistic models (DDPMs) that can generate a series of synthetic action sequences, and their generation process is precisely guided by a spatial-temporal transformer (ST-Trans). Experimental results show that our method outperforms the state-of-the-art (SOTA) motion generation approaches on different naturality and diversity metrics. It proves that its high-quality synthetic data can also be effectively deployed to existing action recognition models with significant performance improvement.

updated: Tue Jul 25 2023 02:24:04 GMT+0000 (UTC)

published: Sun Feb 26 2023 23:02:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト