Human MotionFormer: Transferring Human Motions with Vision Transformers

Hongyu Liu; Xintong Han; ChengBin Jin; Huawei Wei; Zhe Lin; Faqiang Wang; Haoye Dong; Yibing Song; Jia Xu; Qifeng Chen

Human MotionFormer: ビジョントランスフォーマーによる人間の動きの転送

ヒューマンモーショントランスファーは、ターゲットの動的な人物からソースの静的な人物にモーションを合成してモーションを合成することを目的としています。伝達されるモーションの品質を向上させるには、モーションの大きな変化と微妙な変化の両方において、ソースの人物とターゲットのモーションを正確に一致させることが不可欠です。このホワイトペーパーでは、大規模なモーションマッチングと微妙なモーションマッチングをそれぞれキャプチャするために、グローバルな認識とローカルな認識を活用する階層型 ViT フレームワークである Human MotionFormer を提案します。これは、入力特徴 (つまり、ターゲットモーションイメージとソース人物イメージ) を抽出する 2 つの ViT エンコーダーと、特徴マッチングとモーション転送用の複数のカスケードブロックを備えた ViT デコーダーで構成されます。各ブロックでは、ターゲットのモーション機能をクエリとして、ソースの人物をキーと値として設定し、クロスアテンションマップを計算してグローバルな機能マッチングを実行します。さらに、畳み込み層を導入して、グローバル相互注意計算後のローカル知覚を改善します。このマッチングプロセスは、ワーピングブランチと生成ブランチの両方で実装され、モーション転送をガイドします。トレーニング中に、相互学習損失を提案して、ワーピングブランチと生成ブランチ間の共同監督を有効にし、より良いモーション表現を実現します。実験では、Human MotionFormer が質的にも量的にも新しい最先端のパフォーマンスを設定することが示されています。プロジェクトページ：https://github.com/KumapowerLIU/Human-MotionFormer

Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis. An accurate matching between the source person and the target motion in both large and subtle motion changes is vital for improving the transferred motion quality. In this paper, we propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching, respectively. It consists of two ViT encoders to extract input features (i.e., a target motion image and a source human image) and a ViT decoder with several cascaded blocks for feature matching and motion transfer. In each block, we set the target motion feature as Query and the source person as Key and Value, calculating the cross-attention maps to conduct a global feature matching. Further, we introduce a convolutional layer to improve the local perception after the global cross-attention computations. This matching process is implemented in both warping and generation branches to guide the motion transfer. During training, we propose a mutual learning loss to enable the co-supervision between warping and generation branches for better motion representations. Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively. Project page: https://github.com/KumapowerLIU/Human-MotionFormer

updated: Wed Feb 22 2023 11:42:44 GMT+0000 (UTC)

published: Wed Feb 22 2023 11:42:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト