HeatER: An Efficient and Unified Network for Human Reconstruction via Heatmap-based TransformER

Ce Zheng; Matias Mendieta; Taojiannan Yang; Chen Chen

HeatER：ヒートマップベースのTransformERを介した人間の再構築のための効率的で統合されたネットワーク

最近、ビジョントランスフォーマーは、2D人間ポーズ推定（2D HPE）、3D人間ポーズ推定（3D HPE）、および人間メッシュ再構成（HMR）タスクで大きな成功を収めています。これらのタスクでは、人間の構造情報のヒートマップ表現が、最初にCNNによって画像から抽出され、次にトランスアーキテクチャでさらに処理されて、最終的なHPEまたはHMRの推定値が提供されることがよくあります。ただし、既存のトランスアーキテクチャでは、これらのヒートマップ入力を直接処理できないため、入力前にフィーチャが不自然に平坦化されます。さらに、最近のHPEおよびHMR方式のパフォーマンス上の利点の多くは、計算とメモリのニーズがますます増大するという犠牲を払ってもたらされています。したがって、これらの問題に同時に対処するために、メモリと計算コストを削減しながら注意をモデル化するときにヒートマップ表現の固有の構造を保持する新しいトランス設計であるHeatERを提案します。 HeatERを利用して、2D HPE、3D HPE、およびHMRタスク用の統合された効率的なネットワークを構築します。ヒートマップ再構築モジュールは、推定された人間のポーズとメッシュの堅牢性を向上させるために適用されます。広範な実験により、さまざまな人間のポーズおよびメッシュデータセットに対するHeatERの有効性が実証されています。たとえば、HeatERは、Human3.6Mおよび3DPWデータセットで5％のParamsと16％のMACを必要とすることにより、SOTAメソッドMeshGraphormerよりも優れています。コードは公開されます。

Recently, vision transformers have shown great success in 2D human pose estimation (2D HPE), 3D human pose estimation (3D HPE), and human mesh reconstruction (HMR) tasks. In these tasks, heatmap representations of the human structural information are often extracted first from the image by a CNN, and then further processed with a transformer architecture to provide the final HPE or HMR estimation. However, existing transformer architectures are not able to process these heatmap inputs directly, forcing an unnatural flattening of the features prior to input. Furthermore, much of the performance benefit in recent HPE and HMR methods has come at the cost of ever-increasing computation and memory needs. Therefore, to simultaneously address these problems, we propose HeatER, a novel transformer design which preserves the inherent structure of heatmap representations when modeling attention while reducing the memory and computational costs. Taking advantage of HeatER, we build a unified and efficient network for 2D HPE, 3D HPE, and HMR tasks. A heatmap reconstruction module is applied to improve the robustness of the estimated human pose and mesh. Extensive experiments demonstrate the effectiveness of HeatER on various human pose and mesh datasets. For instance, HeatER outperforms the SOTA method MeshGraphormer by requiring 5% of Params and 16% of MACs on Human3.6M and 3DPW datasets. Code will be publicly available.

updated: Mon May 30 2022 22:09:57 GMT+0000 (UTC)

published: Mon May 30 2022 22:09:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト