ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention

Alec Diaz-Arias; Dmitriy Shin

ConvFormer: Dynamic Multi-Headed Convolutional Attention を活用した 3D 人間の姿勢推定のためのトランスフォーマーモデルのパラメーター削減

最近、完全なトランスフォーマーアーキテクチャが、3D 人間の姿勢推定タスクの事実上の畳み込みアーキテクチャに取って代わりました。この論文では、ConvFormer を提案します。ConvFormer は、単眼 3D 人間の姿勢推定のための新しい動的マルチヘッド畳み込み自己注意メカニズムを活用する新しい畳み込みトランスフォーマーです。個々のフレーム内およびモーションシーケンス全体でグローバルに人間の関節関係を包括的にモデル化するために、空間的および時間的な畳み込みトランスフォーマーを設計しました。さらに、一時的な ConvFormer の一時的なジョイントプロファイルの新しい概念を導入します。これは、ジョイントフィーチャのローカルな近傍の完全な一時的な情報を即座に融合します。 3 つの一般的なベンチマークデータセット (Human3.6M、MPI-INF-3DHP、および HumanEva) で、この方法を定量的および定性的に検証しました。最適なハイパーパラメータセットを特定するために、広範な実験が行われました。これらの実験は、3 つのデータセットすべてで最先端 (SOTA) またはほぼ SOTA を達成しながら、以前の変圧器モデルと比較して大幅なパラメーター削減を達成したことを実証しました。さらに、GT および CPN 検出入力の両方について、H36M でプロトコル III の SOTA を達成しました。最後に、プロトコル II の下で、MPI-INF-3DHP データセットの 3 つのメトリックすべてと、HumanEva の 3 つの被験者すべての SOTA を取得しました。

Recently, fully-transformer architectures have replaced the defacto convolutional architecture for the 3D human pose estimation task. In this paper we propose ConvFormer, a novel convolutional transformer that leverages a new dynamic multi-headed convolutional self-attention mechanism for monocular 3D human pose estimation. We designed a spatial and temporal convolutional transformer to comprehensively model human joint relations within individual frames and globally across the motion sequence. Moreover, we introduce a novel notion of temporal joints profile for our temporal ConvFormer that fuses complete temporal information immediately for a local neighborhood of joint features. We have quantitatively and qualitatively validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have been conducted to identify the optimal hyper-parameter set. These experiments demonstrated that we achieved a significant parameter reduction relative to prior transformer models while attaining State-of-the-Art (SOTA) or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on all three metrics for the MPI-INF-3DHP dataset and for all three subjects on HumanEva under Protocol II.

updated: Tue Apr 04 2023 22:23:50 GMT+0000 (UTC)

published: Tue Apr 04 2023 22:23:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト