Reconstructing and grounding narrated instructional videos in 3D

Dimitri Zhukov; Ignacio Rocco; Ivan Laptev; Josef Sivic; Johannes L. Schnberger; Bugra Tekin; Marc Pollefeys

ナレーション付きの説明ビデオを3Dで再構築して接地する

ナレーション付きの説明ビデオでは、車やラップトップの特定のモデルの修理など、同様のオブジェクトの操作を示して説明することがよくあります。この作品では、そのようなオブジェクトを再構築し、関連するナレーションを3Dでローカライズすることを目指しています。すべてのビューに同一のオブジェクトまたはシーンが存在するインスタンスレベルの3D再構築の標準シナリオとは異なり、同じ製品の条件やバージョンが異なると、異なる教育ビデオのオブジェクトの外観が大きく異なる場合があります。ナレーションは、自然言語の表現にも大きなばらつきがある場合があります。これらの課題には、3つの貢献によって対処します。まず、学習した局所的特徴と密な流れを組み合わせた対応推定のアプローチを提案します。次に、個々のビデオの最初の3D再構成を3Dアライメントグラフに結合する、2段階の分割統治再構成アプローチを設計します。最後に、取得した3D再構成における地上自然言語への教師なしアプローチを提案します。私たちは、自動車のメンテナンスの領域に対する私たちのアプローチの有効性を示しています。生の説明ビデオがあり、手動による監視がない場合、私たちの方法は、さまざまな車種のエンジンを正常に再構築し、テキストによる説明を3Dの対応するオブジェクトに関連付けます。

Narrated instructional videos often show and describe manipulations of similar objects, e.g., repairing a particular model of a car or laptop. In this work we aim to reconstruct such objects and to localize associated narrations in 3D. Contrary to the standard scenario of instance-level 3D reconstruction, where identical objects or scenes are present in all views, objects in different instructional videos may have large appearance variations given varying conditions and versions of the same product. Narrations may also have large variation in natural language expressions. We address these challenges by three contributions. First, we propose an approach for correspondence estimation combining learnt local features and dense flow. Second, we design a two-step divide and conquer reconstruction approach where the initial 3D reconstructions of individual videos are combined into a 3D alignment graph. Finally, we propose an unsupervised approach to ground natural language in obtained 3D reconstructions. We demonstrate the effectiveness of our approach for the domain of car maintenance. Given raw instructional videos and no manual supervision, our method successfully reconstructs engines of different car models and associates textual descriptions with corresponding objects in 3D.

updated: Thu Sep 09 2021 16:49:10 GMT+0000 (UTC)

published: Thu Sep 09 2021 16:49:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト