Super Resolution in Human Pose Estimation: Pixelated Poses to a Resolution Result?

Peter Hardy; Srinandan Dasmahapatra; Hansung Kim

人間のポーズ推定における超解像：解像結果へのピクセル化されたポーズ？

最先端の人間の姿勢推定（HPE）モデルから得られた結果は、低解像度の人を評価すると急速に低下しますが、超解像（SR）を使用してこの影響を軽減できますか？さまざまなSRアプローチを使用して、2つの低解像度データセットを拡張し、オブジェクトとキーポイント検出器の両方のパフォーマンスの変化と、エンドツーエンドのHPE結果を評価しました。以下の所見に留意します。まず、SRを適用すると、低解像度の人のキーポイント検出パフォーマンスが向上することがわかりました。第二に、得られるキーポイント検出性能は、元の画像の人物の初期解像度（ピクセル単位のセグメンテーション領域）に依存します。初期セグメンテーション領域が小さい人にSRを適用すると、キーポイント検出のパフォーマンスは向上しましたが、これが大きくなると低下します。これに対処するために、セグメンテーション領域のしきい値を利用して、キーポイント検出ステップ中にSRをいつ使用するかを決定する新しいMask-RCNNアプローチを導入しました。このアプローチにより、HPEの各パフォーマンスメトリックで最良の結果が得られました。

The results obtained from state of the art human pose estimation (HPE) models degrade rapidly when evaluating people of a low resolution, but can super resolution (SR) be used to help mitigate this effect? By using various SR approaches we enhanced two low resolution datasets and evaluated the change in performance of both an object and keypoint detector as well as end-to-end HPE results. We remark the following observations. First we find that for low resolution people their keypoint detection performance improved once SR was applied. Second, the keypoint detection performance gained is dependent on the persons initial resolution (segmentation area in pixels) in the original image; keypoint detection performance was improved when SR was applied to people with a small initial segmentation area, but degrades as this becomes larger. To address this we introduced a novel Mask-RCNN approach, utilising a segmentation area threshold to decide when to use SR during the keypoint detection step. This approach achieved the best results for each of our HPE performance metrics.

updated: Mon Jul 05 2021 16:06:55 GMT+0000 (UTC)

published: Mon Jul 05 2021 16:06:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト