BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos

Jennifer J. Sun; Pierre Karashchuk; Amil Dravid; Serim Ryou; Sonia Fereidooni; John Tuthill; Aggelos Katsaggelos; Bingni W. Brunton; Georgia Gkioxari; Ann Kennedy; Yisong Yue; Pietro Perona

BKinD-3D: マルチビュービデオからの自己教師あり 3D キーポイント検出

3D での動きの定量化は、人間や他の動物の行動を研究する上で重要ですが、手動でポーズの注釈を付けるには、費用と時間がかかります。自己教師ありキーポイント検出は、注釈なしで 3D ポーズを推定するための有望な戦略です。ただし、現在のキーポイント発見アプローチは、通常、単一の 2D ビューを処理し、3D 空間では動作しません。 2D または 3D でのキーポイントやバウンディングボックスの監視なしで、動作しているエージェントのマルチビュービデオから 3D で自己教師付きキーポイントの発見を実行する新しい方法を提案します。私たちの方法は、被験者の学習した3Dスケルトンの関節長の制約に加えて、複数のビューにわたる時空間の違いを再構築するように訓練された、3Dボリュームヒートマップを備えたエンコーダーデコーダーアーキテクチャを使用します。このようにして、人間とラットのビデオで手動の監督を必要とせずにキーポイントを発見し、行動を研究するための 3D キーポイント発見の可能性を示しています。

Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint discovery approaches commonly process single 2D views and do not operate in the 3D space. We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents, without any keypoint or bounding box supervision in 2D or 3D. Our method uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct spatiotemporal differences across multiple views, in addition to joint length constraints on a learned 3D skeleton of the subject. In this way, we discover keypoints without requiring manual supervision in videos of humans and rats, demonstrating the potential of 3D keypoint discovery for studying behavior.

updated: Wed Dec 14 2022 18:34:29 GMT+0000 (UTC)

published: Wed Dec 14 2022 18:34:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト