Deep Dual Consecutive Network for Human Pose Estimation

Zhenguang Liu; Haoming Chen; Runyang Feng; Shuang Wu; Shouling Ji; Bailin Yang; Xun Wang

人間の姿勢推定のためのディープデュアル連続ネットワーク

複雑な状況でのマルチフレームの人間の姿勢の推定は困難です。最先端の人間の関節検出器は静止画像で驚くべき結果を示していますが、これらのモデルをビデオシーケンスに適用すると、パフォーマンスが低下します。一般的な欠点には、ビデオフレーム間の時間的依存関係をキャプチャできないことに起因する、モーションブラー、ビデオの焦点ぼけ、またはポーズのオクルージョンの処理の失敗が含まれます。一方、従来のリカレントニューラルネットワークを直接使用すると、特にポーズのオクルージョンを処理する場合に、空間コンテキストのモデリングに経験的な問題が発生します。この論文では、キーポイントの検出を容易にするためにビデオフレーム間の豊富な時間的手がかりを活用する、新しいマルチフレームの人間の姿勢推定フレームワークを提案します。私たちのフレームワークでは、3つのモジュラーコンポーネントが設計されています。 Pose Temporal Mergerは、キーポイントの時空間コンテキストをエンコードして効果的な検索スコープを生成し、Pose ResidualFusionモジュールは2方向の重み付きポーズ残差を計算します。これらは、ポーズ推定を効率的に改善するために、ポーズ補正ネットワークを介して処理されます。私たちの方法は、大規模なベンチマークデータセットPoseTrack2017およびPoseTrack2018のマルチフレーム人物ポーズ推定チャレンジで1位にランクされています。将来の研究を刺激することを期待して、コードをリリースしました。

Multi-frame human pose estimation in complicated situations is challenging. Although state-of-the-art human joints detectors have demonstrated remarkable results for static images, their performances come short when we apply these models to video sequences. Prevalent shortcomings include the failure to handle motion blur, video defocus, or pose occlusions, arising from the inability in capturing the temporal dependency among video frames. On the other hand, directly employing conventional recurrent neural networks incurs empirical difficulties in modeling spatial contexts, especially for dealing with pose occlusions. In this paper, we propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection. Three modular components are designed in our framework. A Pose Temporal Merger encodes keypoint spatiotemporal context to generate effective searching scopes while a Pose Residual Fusion module computes weighted pose residuals in dual directions. These are then processed via our Pose Correction Network for efficient refining of pose estimations. Our method ranks No.1 in the Multi-frame Person Pose Estimation Challenge on the large-scale benchmark datasets PoseTrack2017 and PoseTrack2018. We have released our code, hoping to inspire future research.

updated: Mon Mar 15 2021 02:24:12 GMT+0000 (UTC)

published: Fri Mar 12 2021 13:11:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト