Neural Interactive Keypoint Detection

Jie Yang; Ailing Zeng; Feng Li; Shilong Liu; Ruimao Zhang; Lei Zhang

ニューラルインタラクティブキーポイント検出

この研究では、Click-Pose というエンドツーエンドのニューラル対話型キーポイント検出フレームワークを提案しています。これにより、手動のみのアノテーションと比較して、2D キーポイントアノテーションのラベル付けコストを 10 倍以上大幅に削減できます。 Click-Pose は、ユーザーのフィードバックがニューラルキーポイント検出器とどのように連携して、対話型の方法で予測キーポイントを修正し、より高速かつ効果的なアノテーションプロセスを実現できるかを検討します。具体的には、4 つの典型的なポーズエラーと組み合わせたグラウンドトゥルースポーズをデコーダに入力し、モデルをトレーニングして正しいポーズを再構築するポーズエラーモデリング戦略を設計します。これにより、モデルの自己修正能力が向上します。次に、ユーザーのクリックを受信して 1 つまたは複数の予測キーポイントを修正できるようにする対話型の人間によるフィードバックループを接続し、デコーダーを繰り返し利用して最小限のクリック数 (NoC) で他のすべてのキーポイントを更新し、効率的なアノテーションを実現します。ドメイン内、ドメイン外のシーンでのクリックポーズ、およびキーポイント適応の新しいタスクを検証します。アノテーションの場合、Click-Pose は COCO および Human-Art で 1.97 および 6.45 NoC@95 (精度 95%) のみを必要とし、手動補正を使用した SOTA モデル (ViTPose) よりもそれぞれ 31.4% および 36.3% の労力を削減します。さらに、ユーザーのクリックなしで、Click-Pose は以前のエンドツーエンドモデルを COCO で 1.4 AP、Human-Art で 3.0 AP 上回りました。コードは https://github.com/IDEA-Research/Click-Pose で入手できます。

This work proposes an end-to-end neural interactive keypoint detection framework named Click-Pose, which can significantly reduce more than 10 times labeling costs of 2D keypoint annotation compared with manual-only annotation. Click-Pose explores how user feedback can cooperate with a neural keypoint detector to correct the predicted keypoints in an interactive way for a faster and more effective annotation process. Specifically, we design the pose error modeling strategy that inputs the ground truth pose combined with four typical pose errors into the decoder and trains the model to reconstruct the correct poses, which enhances the self-correction ability of the model. Then, we attach an interactive human-feedback loop that allows receiving users' clicks to correct one or several predicted keypoints and iteratively utilizes the decoder to update all other keypoints with a minimum number of clicks (NoC) for efficient annotation. We validate Click-Pose in in-domain, out-of-domain scenes, and a new task of keypoint adaptation. For annotation, Click-Pose only needs 1.97 and 6.45 NoC@95 (at precision 95%) on COCO and Human-Art, reducing 31.4% and 36.3% efforts than the SOTA model (ViTPose) with manual correction, respectively. Besides, without user clicks, Click-Pose surpasses the previous end-to-end model by 1.4 AP on COCO and 3.0 AP on Human-Art. The code is available at https://github.com/IDEA-Research/Click-Pose.

updated: Sun Aug 20 2023 06:36:49 GMT+0000 (UTC)

published: Sun Aug 20 2023 06:36:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト