FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

Ce Zheng; Matias Mendieta; Taojiannan Yang; Guo-Jun Qi; Chen Chen

FeatER: 特徴マップベースの TransformER による人間の再建のための効率的なネットワーク

最近、ビジョントランスフォーマーは、2D 人間姿勢推定 (2D HPE)、3D 人間姿勢推定 (3D HPE)、ヒューマンメッシュ再構築 (HMR) タスクなど、一連の人間再構築タスクで大きな成功を収めています。これらのタスクでは、多くの場合、人間の構造情報の特徴マップ表現が最初に CNN (HRNet など) によって画像から抽出され、次にトランスフォーマーによってさらに処理されてヒートマップが予測されます (各関節の位置をガウス分布の特徴マップにエンコードします)。ディストリビューション) HPE または HMR 用。ただし、既存のトランスフォーマーアーキテクチャは、これらの機能マップ入力を直接処理することができないため、位置に敏感な人間の構造情報を不自然に平坦化する必要があります。さらに、最近の HPE および HMR メソッドのパフォーマンス上の利点の多くは、増え続ける計算とメモリのニーズという代償を払って実現されています。したがって、これらの問題に同時に対処するために、メモリと計算コストを削減しながら、注意をモデル化するときに特徴マップ表現の固有の構造を保持する新しいトランスフォーマー設計である FeatER を提案します。 FeatER を利用して、2D HPE、3D HPE、および HMR を含む一連の人間の再構成タスクのための効率的なネットワークを構築します。推定された人間のポーズとメッシュのパフォーマンスを向上させるために、特徴マップ再構築モジュールが適用されます。広範な実験により、さまざまな人間のポーズとメッシュデータセットに対する FeatER の有効性が実証されています。たとえば、FeatER は、Human3.6M および 3DPW データセットで Params の 5% と MAC の 16% を必要とするため、SOTA メソッド MeshGraphormer よりも優れています。プロジェクトの Web ページは https://zczcwh.github.io/feater_page/ です。

Recently, vision transformers have shown great success in a set of human reconstruction tasks such as 2D human pose estimation (2D HPE), 3D human pose estimation (3D HPE), and human mesh reconstruction (HMR) tasks. In these tasks, feature map representations of the human structural information are often extracted first from the image by a CNN (such as HRNet), and then further processed by transformer to predict the heatmaps (encodes each joint's location into a feature map with a Gaussian distribution) for HPE or HMR. However, existing transformer architectures are not able to process these feature map inputs directly, forcing an unnatural flattening of the location-sensitive human structural information. Furthermore, much of the performance benefit in recent HPE and HMR methods has come at the cost of ever-increasing computation and memory needs. Therefore, to simultaneously address these problems, we propose FeatER, a novel transformer design that preserves the inherent structure of feature map representations when modeling attention while reducing memory and computational costs. Taking advantage of FeatER, we build an efficient network for a set of human reconstruction tasks including 2D HPE, 3D HPE, and HMR. A feature map reconstruction module is applied to improve the performance of the estimated human pose and mesh. Extensive experiments demonstrate the effectiveness of FeatER on various human pose and mesh datasets. For instance, FeatER outperforms the SOTA method MeshGraphormer by requiring 5% of Params and 16% of MACs on Human3.6M and 3DPW datasets. The project webpage is https://zczcwh.github.io/feater_page/.

updated: Thu Mar 23 2023 15:48:05 GMT+0000 (UTC)

published: Mon May 30 2022 22:09:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト