ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation

Yufei Xu; Jing Zhang; Qiming Zhang; Dacheng Tao

ViTPose：人間のポーズ推定のためのシンプルなビジョントランスフォーマーベースライン

最近、カスタマイズされたビジョントランスフォーマーが人間の姿勢推定に適合し、精巧な構造で優れた性能を実現しています。ただし、プレーンビジョントランスフォーマーがポーズ推定を容易にすることができるかどうかはまだ不明です。この論文では、人間の姿勢推定のためにViTPoseと呼ばれる単純なデコンボリューションデコーダーと一緒にプレーンで非階層的なビジョントランスフォーマーを使用することにより、質問に答えるための第一歩を踏み出します。 MAE事前トレーニングを備えたプレーンビジョントランスフォーマーは、人間の姿勢推定データセットを微調整した後、優れたパフォーマンスを得ることができることを示しています。 ViTPoseは、モデルサイズに関して優れたスケーラビリティを備えており、入力解像度とトークン番号に関する柔軟性を備えています。さらに、大規模なアップストリームImageNetデータを必要とせずに、ラベルのないポーズデータを使用して簡単に事前トレーニングすることができます。 10億個のパラメーターを持つViTAE-Gバックボーンに基づく最大のViTPoseモデルは、MSCOCOテスト開発セットで最高の80.9mAPを取得し、アンサンブルモデルは、人間の姿勢推定のための新しい最先端技術をさらに設定します。、81.1mAP。ソースコードとモデルはhttps://github.com/ViTAE-Transformer/ViTPoseでリリースされます。

Recently, customized vision transformers have been adapted for human pose estimation and have achieved superior performance with elaborate structures. However, it is still unclear whether plain vision transformers can facilitate pose estimation. In this paper, we take the first step toward answering the question by employing a plain and non-hierarchical vision transformer together with simple deconvolution decoders termed ViTPose for human pose estimation. We demonstrate that a plain vision transformer with MAE pretraining can obtain superior performance after finetuning on human pose estimation datasets. ViTPose has good scalability with respect to model size and flexibility regarding input resolution and token number. Moreover, it can be easily pretrained using the unlabeled pose data without the need for large-scale upstream ImageNet data. Our biggest ViTPose model based on the ViTAE-G backbone with 1 billion parameters obtains the best 80.9 mAP on the MS COCO test-dev set, while the ensemble models further set a new state-of-the-art for human pose estimation, i.e., 81.1 mAP. The source code and models will be released at https://github.com/ViTAE-Transformer/ViTPose.

updated: Tue Apr 26 2022 17:55:04 GMT+0000 (UTC)

published: Tue Apr 26 2022 17:55:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト