Parsing is All You Need for Accurate Gait Recognition in the Wild

Jinkai Zheng; Xinchen Liu; Shuai Wang; Lihao Wang; Chenggang Yan; Wu Liu

野生での正確な歩行認識に必要なのは解析だけです

バイナリシルエットとキーポイントベースのスケルトンは、ビデオフレームから簡単に抽出できるため、数十年にわたって人間の歩行認識研究の主流を占めてきました。研究室環境での歩行認識では成功したにもかかわらず、歩行表現の情報エントロピーが低いため、現実世界のシナリオでは通常失敗します。野外での正確な歩行認識を実現するために、この論文では、Gait Parsing Sequence (GPS) と呼ばれる新しい歩行表現を紹介します。 GPS は、ビデオフレームから抽出された一連の細粒度の人間のセグメンテーション、つまり人間の解析であるため、歩行中の細粒度の人間の部分の形状とダイナミクスをエンコードするためのはるかに高い情報エントロピーを備えています。さらに、GPS 表現の機能を効果的に調べるために、ParsingGait という新しい人間解析ベースの歩行認識フレームワークを提案します。 ParsingGait には、畳み込みニューラルネットワーク (CNN) ベースのバックボーンと 2 つの軽量ヘッドが含まれています。最初のヘッドは GPS からグローバルな意味特徴を抽出し、もう 1 つのヘッドはグラフ畳み込みネットワークを通じて部品レベルの特徴の相互情報を学習し、人間の歩行の詳細なダイナミクスをモデル化します。さらに、適切なデータセットが不足しているため、大規模で困難な Gait3D データセットを拡張することにより、Gait3D-Parsing という名前の、野生の歩行認識用の最初の解析ベースのデータセットを構築しました。 Gait3D-Parsingに基づいて、我々の手法と既存の歩行認識手法を総合的に評価します。実験結果は、GPS 表現による精度の大幅な向上と ParsingGait の優位性を示しています。コードとデータセットは https://gait3d.github.io/gait3d-parsing-hp で入手できます。

Binary silhouettes and keypoint-based skeletons have dominated human gait recognition studies for decades since they are easy to extract from video frames. Despite their success in gait recognition for in-the-lab environments, they usually fail in real-world scenarios due to their low information entropy for gait representations. To achieve accurate gait recognition in the wild, this paper presents a novel gait representation, named Gait Parsing Sequence (GPS). GPSs are sequences of fine-grained human segmentation, i.e., human parsing, extracted from video frames, so they have much higher information entropy to encode the shapes and dynamics of fine-grained human parts during walking. Moreover, to effectively explore the capability of the GPS representation, we propose a novel human parsing-based gait recognition framework, named ParsingGait. ParsingGait contains a Convolutional Neural Network (CNN)-based backbone and two light-weighted heads. The first head extracts global semantic features from GPSs, while the other one learns mutual information of part-level features through Graph Convolutional Networks to model the detailed dynamics of human walking. Furthermore, due to the lack of suitable datasets, we build the first parsing-based dataset for gait recognition in the wild, named Gait3D-Parsing, by extending the large-scale and challenging Gait3D dataset. Based on Gait3D-Parsing, we comprehensively evaluate our method and existing gait recognition methods. The experimental results show a significant improvement in accuracy brought by the GPS representation and the superiority of ParsingGait. The code and dataset are available at https://gait3d.github.io/gait3d-parsing-hp .

updated: Thu Aug 31 2023 13:57:38 GMT+0000 (UTC)

published: Thu Aug 31 2023 13:57:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト