Hierarchical Contrast for Unsupervised Skeleton-based Action Representation Learning

Jianfeng Dong; Shengkai Sun; Zhonglin Liu; Shujie Chen; Baolong Liu; Xun Wang

教師なしスケルトンベースのアクション表現学習のための階層的コントラスト

この論文は、教師なしスケルトンベースのアクション表現学習を対象とし、新しい Hierarchical Contrast (HiCo) フレームワークを提案します。通常、インスタンスレベルの機能への入力スケルトンシーケンスを表し、全体的にコントラストを実行する既存のコントラストベースのソリューションとは異なり、提案された HiCo は、複数レベルの機能への入力を表し、階層的な方法でコントラストを実行します。具体的には、人間の骨格シーケンスが与えられた場合、シーケンスツーシーケンス (S2S) エンコーダーと統合ダウンサンプリングモジュールを介して、時間ドメインと空間ドメインの両方から異なる粒度の複数の特徴ベクトルに表現します。また、階層対比は、インスタンスレベル、ドメインレベル、クリップレベル、パーツレベルの4つのレベルで行われます。さらに、HiCo は S2S エンコーダーと直交しているため、最先端の S2S エンコーダーを柔軟に受け入れることができます。 NTU-60、NTU-120、PKU-MMD I および II の 4 つのデータセットに関する広範な実験により、HiCo が、アクションを含む 2 つのダウンストリームタスクで、教師なしスケルトンベースのアクション表現学習の新しい最先端技術を達成することが示されました。認識と検索、およびその学習されたアクションの表現は、優れた伝達性があります。さらに、このフレームワークが半教師付きスケルトンベースのアクション認識に有効であることも示します。コードは https://github.com/HuiGuanLab/HiCo で入手できます。

This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing contrastive-based solutions that typically represent an input skeleton sequence into instance-level features and perform contrast holistically, our proposed HiCo represents the input into multiple-level features and performs contrast in a hierarchical manner. Specifically, given a human skeleton sequence, we represent it into multiple feature vectors of different granularities from both temporal and spatial domains via sequence-to-sequence (S2S) encoders and unified downsampling modules. Besides, the hierarchical contrast is conducted in terms of four levels: instance level, domain level, clip level, and part level. Moreover, HiCo is orthogonal to the S2S encoder, which allows us to flexibly embrace state-of-the-art S2S encoders. Extensive experiments on four datasets, i.e., NTU-60, NTU-120, PKU-MMD I and II, show that HiCo achieves a new state-of-the-art for unsupervised skeleton-based action representation learning in two downstream tasks including action recognition and retrieval, and its learned action representation is of good transferability. Besides, we also show that our framework is effective for semi-supervised skeleton-based action recognition. Our code is available at https://github.com/HuiGuanLab/HiCo.

updated: Mon Dec 05 2022 07:52:23 GMT+0000 (UTC)

published: Mon Dec 05 2022 07:52:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト