VideoPose: Estimating 6D object pose from videos

Apoorva Beedu; Zhile Ren; Varun Agrawal; Irfan Essa

VideoPose：ビデオから6Dオブジェクトのポーズを推定する

畳み込みニューラルネットワークを使用して、ビデオからオブジェクトのポーズを直接推定する、シンプルでありながら効果的なアルゴリズムを紹介します。私たちのアプローチは、ビデオシーケンスからの時間情報を活用し、ロボットおよびARドメインをサポートするために計算効率が高く堅牢です。私たちが提案するネットワークは、事前にトレーニングされた2Dオブジェクト検出器を入力として受け取り、リカレントニューラルネットワークを介して視覚的特徴を集約して、各フレームで予測を行います。 YCB-Videoデータセットの実験的評価は、私たちのアプローチが最先端のアルゴリズムと同等であることを示しています。さらに、30 fpsの速度で、最先端のものよりも効率的であるため、リアルタイムのオブジェクトポーズ推定を必要とするさまざまなアプリケーションに適用できます。

We introduce a simple yet effective algorithm that uses convolutional neural networks to directly estimate object poses from videos. Our approach leverages the temporal information from a video sequence, and is computationally efficient and robust to support robotic and AR domains. Our proposed network takes a pre-trained 2D object detector as input, and aggregates visual features through a recurrent neural network to make predictions at each frame. Experimental evaluation on the YCB-Video dataset show that our approach is on par with the state-of-the-art algorithms. Further, with a speed of 30 fps, it is also more efficient than the state-of-the-art, and therefore applicable to a variety of applications that require real-time object pose estimation.

updated: Sat Nov 20 2021 20:57:45 GMT+0000 (UTC)

published: Sat Nov 20 2021 20:57:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト