SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

Ailing Zeng; Lei Yang; Xuan Ju; Jiefeng Li; Jianyi Wang; Qiang Xu

SmoothNet：ビデオ内の人間のポーズを洗練するためのプラグアンドプレイネットワーク

人間のモーションビデオを分析する場合、既存のポーズ推定器からの出力ジッタは、フレーム間でさまざまな推定誤差があり、非常に不均衡です。ビデオのほとんどのフレームは比較的簡単に推定でき、わずかなジッターしか発生しません。対照的に、めったに見られない、または遮られるアクションの場合、複数のジョイントの推定位置は、フレームの連続シーケンスのグラウンドトゥルース値から大きく外れ、それらに重大なジッターをレンダリングします。この問題に取り組むために、SmoothNetという名前のジッター軽減のための既存のポーズ推定器に専用の時間のみのリファインメントネットワークを接続することを提案します。時空間モデルを使用してすべての関節のフレームごとの精度と時間の滑らかさを共同最適化する既存の学習ベースのソリューションとは異なり、SmoothNetは、考慮せずにすべての関節の長距離の時間的関係を学習することにより、体の動きの自然な滑らかさの特性をモデル化します。関節間のノイズの多い相関関係。シンプルでありながら効果的なモーション対応の完全に接続されたネットワークにより、SmoothNetは既存のポーズ推定器の時間的滑らかさを大幅に改善し、副作用としてこれらの挑戦的なフレームの推定精度を向上させます。さらに、時間のみのモデルとして、SmoothNetのユニークな利点は、さまざまなタイプの推定量とデータセット間での強力な転送可能性です。 2Dおよび3Dポーズ推定および身体回復タスクにわたる11の人気のあるバックボーンネットワークを使用した5つのデータセットでの包括的な実験は、提案されたソリューションの有効性を示しています。コードはhttps://github.com/cure-lab/SmoothNetで入手できます。

When analyzing human motion videos, the output jitters from existing pose estimators are highly-unbalanced with varied estimation errors across frames. Most frames in a video are relatively easy to estimate and only suffer from slight jitters. In contrast, for rarely seen or occluded actions, the estimated positions of multiple joints largely deviate from the ground truth values for a consecutive sequence of frames, rendering significant jitters on them. To tackle this problem, we propose to attach a dedicated temporal-only refinement network to existing pose estimators for jitter mitigation, named SmoothNet. Unlike existing learning-based solutions that employ spatio-temporal models to co-optimize per-frame precision and temporal smoothness at all the joints, SmoothNet models the natural smoothness characteristics in body movements by learning the long-range temporal relations of every joint without considering the noisy correlations among joints. With a simple yet effective motion-aware fully-connected network, SmoothNet improves the temporal smoothness of existing pose estimators significantly and enhances the estimation accuracy of those challenging frames as a side-effect. Moreover, as a temporal-only model, a unique advantage of SmoothNet is its strong transferability across various types of estimators and datasets. Comprehensive experiments on five datasets with eleven popular backbone networks across 2D and 3D pose estimation and body recovery tasks demonstrate the efficacy of the proposed solution. Code is available at https://github.com/cure-lab/SmoothNet.

updated: Thu Jul 21 2022 17:15:06 GMT+0000 (UTC)

published: Mon Dec 27 2021 14:53:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト