Tracking People by Predicting 3D Appearance, Location & Pose

Jathushan Rajasegaran; Georgios Pavlakos; Angjoo Kanazawa; Jitendra Malik

3Dの外観、場所、ポーズを予測して人を追跡する

この論文では、将来の3D表現を予測することにより、単眼ビデオで人々を追跡するためのアプローチを提示します。これを実現するために、まず、堅牢な方法で1つのフレームから3Dに人を持ち上げます。このリフティングには、人物の3Dポーズ、3D空間での人物の位置、および3D外観に関する情報が含まれます。人を追跡するとき、トラックレット表現で時間の経過に伴う3D観測を収集します。観測の3Dの性質を考慮して、前の属性のそれぞれについて時間モデルを作成します。これらのモデルを使用して、3Dの位置、3Dの外観、3Dのポーズなど、トラックレットの将来の状態を予測します。将来のフレームでは、トラックレットの予測状態と単一フレームの観測値との類似性を確率的に計算します。関連付けは単純なハンガリーのマッチングで解決され、マッチはそれぞれのトラックレットを更新するために使用されます。さまざまなベンチマークでアプローチを評価し、最先端の結果を報告します。

In this paper, we present an approach for tracking people in monocular videos, by predicting their future 3D representations. To achieve this, we first lift people to 3D from a single frame in a robust way. This lifting includes information about the 3D pose of the person, his or her location in the 3D space, and the 3D appearance. As we track a person, we collect 3D observations over time in a tracklet representation. Given the 3D nature of our observations, we build temporal models for each one of the previous attributes. We use these models to predict the future state of the tracklet, including 3D location, 3D appearance, and 3D pose. For a future frame, we compute the similarity between the predicted state of a tracklet and the single frame observations in a probabilistic manner. Association is solved with simple Hungarian matching, and the matches are used to update the respective tracklets. We evaluate our approach on various benchmarks and report state-of-the-art results.

updated: Wed Dec 08 2021 18:57:15 GMT+0000 (UTC)

published: Wed Dec 08 2021 18:57:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト