Language Knowledge-Assisted Representation Learning for Skeleton-Based Action Recognition

Haojun Xu; Yan Gao; Zheng Hui; Jie Li; Xinbo Gao

スケルトンベースの行動認識のための言語知識支援表現学習

人間が他者の行動をどのように理解して認識するかは、認知メカニズムとニューラルネットワークの組み合わせが関与する複雑な神経科学的問題です。研究によると、人間には側頭頭頂連合野など、トップダウンの注意情報を処理する動作を認識する脳領域があることがわかっています。また、人間には、側頭葉の内側前頭前野など、他人の心を理解し、その意図を分析することに特化した脳領域があります。骨格ベースの動作認識は、人間の骨格の動きのパターンと行動の間の複雑な関係のマッピングを作成します。既存の研究では、意味のあるノード関係をエンコードし、分類用のアクション表現を合成して良好な結果が得られましたが、パフォーマンスを向上させるための潜在的な表現学習を支援するアプリオリな知識を組み込むことを検討した研究はほとんどありませんでした。 LA-GCN は、大規模言語モデル (LLM) 知識支援を使用したグラフ畳み込みネットワークを提案します。まず、LLM 知識がノード間の先験的グローバル関係 (GPR) トポロジーと先験的カテゴリ関係 (CPR) トポロジーにマッピングされます。 GPR は、データレベルから重要なノード情報を強調することを目的として、新しい「ボーン」表現の生成をガイドします。 CPR マッピングは、人間の脳領域におけるカテゴリの事前知識をシミュレートします。これは、PC-AC モジュールによってエンコードされ、追加の監視を追加するために使用され、クラス識別可能な特徴をモデルに強制的に学習させます。さらに、トポロジモデリングにおける情報転送効率を向上させるために、マルチホップアテンショングラフ畳み込みを提案します。各ノードの k 次近傍ノードを同時に集約して、モデルの収束を高速化します。 LA-GCN は、NTU RGB+D、NTU RGB+D 120、および NW-UCLA データセットで最先端の状態に達します。

How humans understand and recognize the actions of others is a complex neuroscientific problem that involves a combination of cognitive mechanisms and neural networks. Research has shown that humans have brain areas that recognize actions that process top-down attentional information, such as the temporoparietal association area. Also, humans have brain regions dedicated to understanding the minds of others and analyzing their intentions, such as the medial prefrontal cortex of the temporal lobe. Skeleton-based action recognition creates mappings for the complex connections between the human skeleton movement patterns and behaviors. Although existing studies encoded meaningful node relationships and synthesized action representations for classification with good results, few of them considered incorporating a priori knowledge to aid potential representation learning for better performance. LA-GCN proposes a graph convolution network using large-scale language models (LLM) knowledge assistance. First, the LLM knowledge is mapped into a priori global relationship (GPR) topology and a priori category relationship (CPR) topology between nodes. The GPR guides the generation of new "bone" representations, aiming to emphasize essential node information from the data level. The CPR mapping simulates category prior knowledge in human brain regions, encoded by the PC-AC module and used to add additional supervision-forcing the model to learn class-distinguishable features. In addition, to improve information transfer efficiency in topology modeling, we propose multi-hop attention graph convolution. It aggregates each node's k-order neighbor simultaneously to speed up model convergence. LA-GCN reaches state-of-the-art on NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.

updated: Sun May 21 2023 08:29:16 GMT+0000 (UTC)

published: Sun May 21 2023 08:29:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト