Invariant Representation Learning for Infant Pose Estimation with Small Data

Xiaofei Huang; Nihang Fu; Shuangjun Liu; Sarah Ostadabbas

小さなデータを使用した幼児の姿勢推定のための不変表現学習

乳児の運動分析は、幼児期の発達研究において非常に重要なトピックです。ただし、人間の姿勢推定のアプリケーションはますます幅広くなっていますが、大規模な大人の姿勢データセットでトレーニングされたモデルは、体の比率とその姿勢の多様性に大きな違いがあるため、乳児の姿勢を推定するのにほとんど成功しません。さらに、プライバシーとセキュリティの考慮事項は、堅牢なモデルをゼロからトレーニングするために必要な適切な幼児の姿勢データの可用性を妨げます。この問題に対処するために、この論文では、(1) 小さいながらも多様な本物の乳児の画像と、生成された合成乳児のポーズを含むハイブリッド合成と本物の乳児ポーズ (SyRIP) データセットの構築と公開、および (2) 多段階不変表現を紹介します。大人のポーズと合成幼児画像の隣接するドメインからの知識を、私たちの微調整されたドメイン適応幼児ポーズ (FiDIP) 推定モデルに転送できる学習戦略。同一のネットワーク構造を使用した私たちのアブレーション研究では、SyRIP データセットでトレーニングされたモデルは、他の唯一の公開幼児姿勢データセットでトレーニングされたモデルよりも顕著な改善を示しています。さまざまな複雑さを持つ姿勢推定バックボーンネットワークと統合された FiDIP は、これらのモデルの微調整されたバージョンよりも一貫して優れたパフォーマンスを発揮します。最先端の DarkPose モデルで最も優れた幼児姿勢推定パフォーマーの 1 人は、93.6 の平均平均精度 (mAP) を示しています。

Infant motion analysis is a topic with critical importance in early childhood development studies. However, while the applications of human pose estimation have become more and more broad, models trained on large-scale adult pose datasets are barely successful in estimating infant poses due to the significant differences in their body ratio and the versatility of their poses. Moreover, the privacy and security considerations hinder the availability of adequate infant pose data required for training of a robust model from scratch. To address this problem, this paper presents (1) building and publicly releasing a hybrid synthetic and real infant pose (SyRIP) dataset with small yet diverse real infant images as well as generated synthetic infant poses and (2) a multi-stage invariant representation learning strategy that could transfer the knowledge from the adjacent domains of adult poses and synthetic infant images into our fine-tuned domain-adapted infant pose (FiDIP) estimation model. In our ablation study, with identical network structure, models trained on SyRIP dataset show noticeable improvement over the ones trained on the only other public infant pose datasets. Integrated with pose estimation backbone networks with varying complexity, FiDIP performs consistently better than the fine-tuned versions of those models. One of our best infant pose estimation performers on the state-of-the-art DarkPose model shows mean average precision (mAP) of 93.6.

updated: Sun May 30 2021 01:45:50 GMT+0000 (UTC)

published: Tue Oct 13 2020 01:10:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト