Unsupervised Discovery of Actions in Instructional Videos

AJ Piergiovanni; Anelia Angelova; Michael S. Ryoo; Irfan Essa

インストラクショナルビデオでの教師なしアクションの発見

この論文では、教師なしの方法で教育ビデオからアトミックアクションを自動的に発見する問題に対処します。教育用ビデオには複雑なアクティビティが含まれており、自律型ロボットや仮想アシスタントなどのインテリジェントエージェントにとって豊富な情報源であり、たとえば、教育用ビデオからステップを自動的に「読み取り」、実行することができます。ただし、動画にアトミックアクティビティ、その境界、または期間の注釈が付けられることはめったにありません。さまざまな教育ビデオから構造化されたヒューマンタスクのアトミックアクションを学習するための教師なしアプローチを紹介します。ビデオの時間的セグメンテーションのためのシーケンシャル確率的自己回帰モデルを提案します。これは、タスクのさまざまなアトミックアクション間のシーケンシャルな関係を表現および発見することを学習し、ビデオの自動および教師なし自己ラベル付けを提供します。私たちのアプローチは、マージンが大きく、最先端の教師なし手法よりも優れています。コードをオープンソース化します。

In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos. Instructional videos contain complex activities and are a rich source of information for intelligent agents, such as, autonomous robots or virtual assistants, which can, for example, automatically `read' the steps from an instructional video and execute them. However, videos are rarely annotated with atomic activities, their boundaries or duration. We present an unsupervised approach to learn atomic actions of structured human tasks from a variety of instructional videos. We propose a sequential stochastic autoregressive model for temporal segmentation of videos, which learns to represent and discover the sequential relationship between different atomic actions of the task, and which provides automatic and unsupervised self-labeling for videos. Our approach outperforms the state-of-the-art unsupervised methods with large margins. We will open source the code.

updated: Mon Jun 28 2021 14:05:01 GMT+0000 (UTC)

published: Mon Jun 28 2021 14:05:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト