KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D Correspondences

Xuanhan Wang; Lianli Gao; Yixuan Zhou; Jingkuan Song; Meng Wang

KTN：複数人の2D-3D通信を学習するための知識伝達ネットワーク

人体の2Dピクセルと3D人体テンプレートの間に密な対応を確立することを目的とした人間の密集した推定は、機械が画像内の人物を理解できるようにするための重要な手法です。現実世界のシーンが複雑で、部分的な注釈しか利用できない実際のシナリオのために、それでもいくつかの課題があり、不完全または誤った推定につながります。この作品では、画像内の複数の人の密集を検出するための新しいフレームワークを提示します。提案された方法は、Knowledge Transfer Network（KTN）と呼ばれ、2つの主要な問題に取り組みます。1）不完全な推定を軽減するために画像表現を洗練する方法、および2）低品質のトレーニングラベルによって引き起こされる誤った推定を減らす方法（つまり、、限定された注釈とクラス不均衡ラベル）。高密度ポーズ推定のために領域のピラミッド型特徴を直接伝播する既存の作業とは異なり、KTNはピラミッド型表現の改良を使用し、特徴解像度を維持し、背景ピクセルを抑制します。この戦略により、精度が大幅に向上します。さらに、KTNは、外部の知識を使用して3Dベースのボディ解析の機能を強化し、構造的なボディ知識グラフを通じて3Dベースのボディパーサーとして十分な注釈からトレーニングされた2Dベースのボディパーサーをキャストします。このようにして、低品質の注釈によって引き起こされる悪影響を大幅に軽減します。 KTNの有効性は、DensePose-COCOデータセットの最先端の方法よりも優れたパフォーマンスによって実証されています。代表的なタスク（人体セグメンテーション、人体部分セグメンテーション、キーポイント検出など）と2つの一般的な高密度ポーズ推定パイプライン（RCNNと完全畳み込みフレームワーク）に関する広範なアブレーション研究と実験結果は、提案された方法の一般化可能性をさらに示しています。

Human densepose estimation, aiming at establishing dense correspondences between 2D pixels of human body and 3D human body template, is a key technique in enabling machines to have an understanding of people in images. It still poses several challenges due to practical scenarios where real-world scenes are complex and only partial annotations are available, leading to incompelete or false estimations. In this work, we present a novel framework to detect the densepose of multiple people in an image. The proposed method, which we refer to Knowledge Transfer Network (KTN), tackles two main problems: 1) how to refine image representation for alleviating incomplete estimations, and 2) how to reduce false estimation caused by the low-quality training labels (i.e., limited annotations and class-imbalance labels). Unlike existing works directly propagating the pyramidal features of regions for densepose estimation, the KTN uses a refinement of pyramidal representation, where it simultaneously maintains feature resolution and suppresses background pixels, and this strategy results in a substantial increase in accuracy. Moreover, the KTN enhances the ability of 3D based body parsing with external knowledges, where it casts 2D based body parsers trained from sufficient annotations as a 3D based body parser through a structural body knowledge graph. In this way, it significantly reduces the adverse effects caused by the low-quality annotations. The effectiveness of KTN is demonstrated by its superior performance to the state-of-the-art methods on DensePose-COCO dataset. Extensive ablation studies and experimental results on representative tasks (e.g., human body segmentation, human part segmentation and keypoints detection) and two popular densepose estimation pipelines (i.e., RCNN and fully-convolutional frameworks), further indicate the generalizability of the proposed method.

updated: Tue Jun 21 2022 03:11:37 GMT+0000 (UTC)

published: Tue Jun 21 2022 03:11:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト