A Dual-Source Attention Transformer for Multi-Person Pose Tracking

Andreas Doering; Juergen Gall

複数人のポーズ追跡のためのデュアルソースアテンショントランスフォーマー

複数人のポーズの追跡は、多くのアプリケーションにとって重要な要素であり、ビデオ内のすべての人の人間のポーズを推定し、それらを経時的に追跡する必要があります。フレーム間でのポーズの関連付けは、モーションブラー、混雑したシーン、オクルージョンなどにより、特にオンライントラッキング方法において未解決の研究問題として残されています。アソシエーションの課題に取り組むために、我々は 3 つの核となる側面を組み込んだデュアルソースアテンショントランスフォーマーを提案します。 i) オクルージョンされた人物を再識別するために、最初の埋め込みを提供する姿勢条件付き再識別ネットワークを提案します。フレーム間で目に見える関節の数が異なる場合でも、人物を照合することができます。 ii) 時間的な姿勢の類似性に基づいてエッジの埋め込みを組み込み、外観と姿勢の類似性の影響が自動的に適応されます。 iii) ポーズとトラックの関連付けと重複の削除のためのアテンションベースのマッチングレイヤーを提案します。 Market1501、PoseTrack 2018、PoseTrack21 でのアプローチを評価します。

Multi-person pose tracking is an important element for many applications and requires to estimate the human poses of all persons in a video and to track them over time. The association of poses across frames remains an open research problem, in particular for online tracking methods, due to motion blur, crowded scenes and occlusions. To tackle the association challenge, we propose a Dual-Source Attention Transformer that incorporates three core aspects: i) In order to re-identify persons that have been occluded, we propose a pose-conditioned re-identification network that provides an initial embedding and allows to match persons even if the number of visible joints differs between the frames. ii) We incorporate edge embeddings based on temporal pose similarity and the impact of appearance and pose similarity is automatically adapted. iii) We propose an attention based matching layer for pose-to-track association and duplicate removal. We evaluate our approach on Market1501, PoseTrack 2018 and PoseTrack21.

updated: Fri Jun 09 2023 10:44:44 GMT+0000 (UTC)

published: Fri Jun 09 2023 10:44:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト