SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos

Ailing Zeng; Lei Yang; Xuan Ju; Jiefeng Li; Jianyi Wang; Qiang Xu

SmoothNet：ビデオで人間のポーズを洗練するためのプラグアンドプレイネットワーク

人間のモーションビデオを分析する場合、既存のポーズ推定器からの出力ジッターは非常に不均衡です。ほとんどのフレームはわずかなジッターに悩まされていますが、オクルージョンまたは画質の悪いフレームでは大きなジッターが発生します。このような複雑なポーズはビデオに残ることが多く、推定結果が悪く、ジッターが大きい連続フレームになります。時間畳み込みネットワーク、リカレントニューラルネットワーク、またはローパスフィルターに基づく既存のポーズ平滑化ソリューションは、ジッタービデオセグメント内の重大で永続的なエラーを考慮せずに、このような長期ジッター問題に対処することはできません。上記の観察に動機付けられて、新しいプラグアンドプレイ改良ネットワーク、すなわちSMOOTHNETを提案します。これは、既存のポーズ推定器に接続して、時間的な滑らかさを改善し、同時にフレームごとの精度を高めることができます。特に、SMOOTHNETは、大きな受容野を備えたシンプルで効果的なデータ駆動型の完全接続ネットワークであり、信頼性の低い推定結果で長期的なジッタの影響を効果的に軽減します。 2Dおよび3Dポーズ推定、身体回復、およびダウンストリームタスクにわたる7つのデータセットを使用して、12のバックボーンネットワークで広範な実験を実施します。私たちの結果は、提案されたSMOOTHNETが、特にエラーが多く長期的なジッターのあるクリップで、既存のソリューションを一貫して上回っていることを示しています。

When analyzing human motion videos, the output jitters from existing pose estimators are highly-unbalanced. Most frames only suffer from slight jitters, while significant jitters occur in those frames with occlusion or poor image quality. Such complex poses often persist in videos, leading to consecutive frames with poor estimation results and large jitters. Existing pose smoothing solutions based on temporal convolutional networks, recurrent neural networks, or low-pass filters cannot deal with such a long-term jitter problem without considering the significant and persistent errors within the jittering video segment. Motivated by the above observation, we propose a novel plug-and-play refinement network, namely SMOOTHNET, which can be attached to any existing pose estimators to improve its temporal smoothness and enhance its per-frame precision simultaneously. Especially, SMOOTHNET is a simple yet effective data-driven fully-connected network with large receptive fields, effectively mitigating the impact of long-term jitters with unreliable estimation results. We conduct extensive experiments on twelve backbone networks with seven datasets across 2D and 3D pose estimation, body recovery, and downstream tasks. Our results demonstrate that the proposed SMOOTHNET consistently outperforms existing solutions, especially on those clips with high errors and long-term jitters.

updated: Mon Dec 27 2021 14:53:30 GMT+0000 (UTC)

published: Mon Dec 27 2021 14:53:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト