DistilPose: Tokenized Pose Regression with Heatmap Distillation

Suhang Ye; Yingyi Zhang; Jie Hu; Liujuan Cao; Shengchuan Zhang; Lei Shen; Jun Wang; Shouhong Ding; Rongrong Ji

DistilPose: ヒートマップ蒸留によるトークン化されたポーズ回帰

人間の姿勢推定の分野では、速度の点では回帰ベースの方法が優勢でしたが、パフォーマンスの点ではヒートマップベースの方法がはるかに優れています。両方のスキームをどのように活用するかは、依然として困難な問題です。この論文では、ヒートマップベースの方法と回帰ベースの方法の間のギャップを埋める、DistilPose と呼ばれる新しい人間の姿勢推定フレームワークを提案します。具体的には、DistilPose は、Token-distilling Encoder (TDE) と Simulated Heatmaps を介して、教師モデル (ヒートマップベース) から生徒モデル (回帰ベース) への知識の伝達を最大化します。 TDE は、トークン化を導入することで、ヒートマップベースと回帰ベースのモデルの特徴空間を調整します。一方、シミュレートされたヒートマップは、明示的なガイダンス (分布と信頼度) を教師のヒートマップから生徒のモデルに転送します。広範な実験により、提案された DistilPose が効率を維持しながら回帰ベースのモデルのパフォーマンスを大幅に改善できることが示されています。具体的には、MSCOCO 検証データセットで、DistilPose-S は 5.36M パラメーター、2.38 GFLOP、40.2 FPS で 71.6% の mAP を取得し、12.95 倍、7.16 倍の計算コストを節約し、教師モデルよりも 4.9 倍高速で、パフォーマンスはわずか 0.9 ポイント低下します。 .さらに、DistilPose-L は MSCOCO 検証データセットで 74.4% の mAP を取得し、主要な回帰ベースのモデルの中で新しい最先端を達成しています。

In the field of human pose estimation, regression-based methods have been dominated in terms of speed, while heatmap-based methods are far ahead in terms of performance. How to take advantage of both schemes remains a challenging problem. In this paper, we propose a novel human pose estimation framework termed DistilPose, which bridges the gaps between heatmap-based and regression-based methods. Specifically, DistilPose maximizes the transfer of knowledge from the teacher model (heatmap-based) to the student model (regression-based) through Token-distilling Encoder (TDE) and Simulated Heatmaps. TDE aligns the feature spaces of heatmap-based and regression-based models by introducing tokenization, while Simulated Heatmaps transfer explicit guidance (distribution and confidence) from teacher heatmaps into student models. Extensive experiments show that the proposed DistilPose can significantly improve the performance of the regression-based models while maintaining efficiency. Specifically, on the MSCOCO validation dataset, DistilPose-S obtains 71.6% mAP with 5.36M parameter, 2.38 GFLOPs and 40.2 FPS, which saves 12.95x, 7.16x computational cost and is 4.9x faster than its teacher model with only 0.9 points performance drop. Furthermore, DistilPose-L obtains 74.4% mAP on MSCOCO validation dataset, achieving a new state-of-the-art among predominant regression-based models.

updated: Wed Mar 08 2023 05:44:22 GMT+0000 (UTC)

published: Sat Mar 04 2023 16:56:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト