Context-Sensitive Temporal Feature Learning for Gait Recognition

Xiaohu Huang; Duowang Zhu; Xinggang Wang; Hao Wang; Bo Yang; Botao He; Wenyu Liu; Bin Feng

歩行認識のための文脈依存の時間的特徴学習

歩行認識は最近ますます研究の注目を集めていますが、シルエットの違いは空間領域で非常に微妙であるため、識別可能な時間表現を学習することは依然として困難です。人間が異なる時間スケールの時間クリップに適応的に焦点を当てることによって異なる被験者の歩行を区別できるという観察に触発されて、歩行認識のための文脈依存時間特徴学習（CSTL）ネットワークを提案します。 CSTLは、3つのスケールで時間的特徴を生成し、ローカルおよびグローバルな観点からのコンテキスト情報に従ってそれらを適応的に集約します。具体的には、CSTLには、マルチスケール機能を融合するためにローカルリレーションモデリングとグローバルリレーションモデリングを後で実行する適応型時間集約モジュールが含まれています。さらに、時間的操作によって引き起こされる空間的特徴の破損を修正するために、CSTLは、識別可能な空間的特徴のグループを選択するための顕著な空間的特徴学習（SSFL）モジュールを組み込んでいます。特に、トランスフォーマーを使用して、グローバルリレーションモデリングとSSFLモジュールを実装します。私たちの知る限り、これは歩行認識にトランスフォーマーを採用した最初の作品です。 3つのデータセットで実施された広範な実験は、最先端のパフォーマンスを示しています。具体的には、CASIA-Bでの通常の歩行、鞄の持ち運び、コート着用の条件下で98.7％、96.2％、88.7％、OU-MVLPで97.5％、GREWで50.6％のランク1の精度を達成しています。

Although gait recognition has drawn increasing research attention recently, it remains challenging to learn discriminative temporal representation, since the silhouette differences are quite subtle in spatial domain. Inspired by the observation that human can distinguish gaits of different subjects by adaptively focusing on temporal clips with different time scales, we propose a context-sensitive temporal feature learning (CSTL) network for gait recognition. CSTL produces temporal features in three scales, and adaptively aggregates them according to the contextual information from local and global perspectives. Specifically, CSTL contains an adaptive temporal aggregation module that subsequently performs local relation modeling and global relation modeling to fuse the multi-scale features. Besides, in order to remedy the spatial feature corruption caused by temporal operations, CSTL incorporates a salient spatial feature learning (SSFL) module to select groups of discriminative spatial features. Particularly, we utilize transformers to implement the global relation modeling and the SSFL module. To the best of our knowledge, this is the first work that adopts transformer in gait recognition. Extensive experiments conducted on three datasets demonstrate the state-of-the-art performance. Concretely, we achieve rank-1 accuracies of 98.7%, 96.2% and 88.7% under normal-walking, bag-carrying and coat-wearing conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW.

updated: Fri Apr 08 2022 05:24:59 GMT+0000 (UTC)

published: Thu Apr 07 2022 07:47:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト