LPFormer: LiDAR Pose Estimation Transformer with Multi-Task Network

Dongqiangzi Ye; Yufei Xie; Weijia Chen; Zixiang Zhou; Hassan Foroosh

LPFormer: マルチタスクネットワークを備えた LiDAR 姿勢推定トランスフォーマー

この技術レポートでは、2023 Waymo Open Dataset Pose Estimation チャレンジの 1 位のソリューションを紹介します。大規模な 3D 人間のキーポイントアノテーションを取得するのが難しいため、これまでの方法では、一般に 3D 人間の姿勢推定のために 2D 画像特徴と 2D 連続アノテーションに依存していました。対照的に、LPFormer と呼ばれる私たちの提案手法は、入力として LiDAR とそれに対応する 3D アノテーションのみを使用します。 LPFormer は 2 つのステージで構成されます。第 1 ステージでは人間のバウンディングボックスを検出し、マルチレベルの特徴表現を抽出します。第 2 ステージではトランスフォーマーベースのネットワークを採用し、これらの特徴を使用して人間のキーポイントを回帰します。 Waymo Open Dataset の実験結果では、以前のマルチモーダルソリューションと比較しても最高のパフォーマンスと改善が実証されています。

In this technical report, we present the 1st place solution for the 2023 Waymo Open Dataset Pose Estimation challenge. Due to the difficulty of acquiring large-scale 3D human keypoint annotation, previous methods have commonly relied on 2D image features and 2D sequential annotations for 3D human pose estimation. In contrast, our proposed method, named LPFormer, uses only LiDAR as its input along with its corresponding 3D annotations. LPFormer consists of two stages: the first stage detects the human bounding box and extracts multi-level feature representations, while the second stage employs a transformer-based network to regress the human keypoints using these features. Experimental results on the Waymo Open Dataset demonstrate the top performance, and improvements even compared to previous multi-modal solutions.

updated: Wed Jun 21 2023 19:20:15 GMT+0000 (UTC)

published: Wed Jun 21 2023 19:20:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト