Anticipating Next Active Objects for Egocentric Videos

Sanket Thakur; Cigdem Beyan; Pietro Morerio; Vittorio Murino; Alessio Del Bue

自己中心的なビデオの次のアクティブなオブジェクトを予測する

この論文では、アクションが発生する前に、接触が発生する可能性のある特定の自己中心的なビデオクリップについて、将来の次のアクティブオブジェクトの位置を予測する問題に対処します。観測されたクリップとアクションセグメントが、いわゆる「接触時間」(TTC) セグメントによって分離されているシナリオで、このようなオブジェクトの位置を推定することを目的としているため、この問題はかなり困難です。以前の手の動きや周囲との相互作用に基づいて人の行動を予測するために、多くの方法が提案されています。ただし、次の可能性のある対話可能なオブジェクトと、TTC ウィンドウ中の一人称の動きと視野のドリフトに関する将来の位置を調査する試みはありませんでした。これを、次のアクティブオブジェクトを予想するタスク (ANACTO) と定義します。この目的のために、トランスフォーマーベースの自己注意フレームワークを提案して、自己中心的なクリップ内の次のアクティブオブジェクトを識別して見つけます。 EpicKitchens-100、EGTEA+、Ego4D の 3 つのデータセットで手法のベンチマークを行います。最初の 2 つのデータセットの注釈も提供します。私たちのアプローチは、関連するベースラインメソッドと比較して最も優れたパフォーマンスを発揮します。また、さまざまな条件での提案された方法とベースライン方法の有効性を理解するために、アブレーション研究も実施しています。コードと ANACTO タスクの注釈は、書類が受理された時点で利用可能になります。

This paper addresses the problem of anticipating the next-active-object location in the future, for a given egocentric video clip where the contact might happen, before any action takes place. The problem is considerably hard, as we aim at estimating the position of such objects in a scenario where the observed clip and the action segment are separated by the so-called ``time to contact'' (TTC) segment. Many methods have been proposed to anticipate the action of a person based on previous hand movements and interactions with the surroundings. However, there have been no attempts to investigate the next possible interactable object, and its future location with respect to the first-person's motion and the field-of-view drift during the TTC window. We define this as the task of Anticipating the Next ACTive Object (ANACTO). To this end, we propose a transformer-based self-attention framework to identify and locate the next-active-object in an egocentric clip. We benchmark our method on three datasets: EpicKitchens-100, EGTEA+ and Ego4D. We also provide annotations for the first two datasets. Our approach performs best compared to relevant baseline methods. We also conduct ablation studies to understand the effectiveness of the proposed and baseline methods on varying conditions. Code and ANACTO task annotations will be made available upon paper acceptance.

updated: Tue Oct 31 2023 15:42:42 GMT+0000 (UTC)

published: Mon Feb 13 2023 13:44:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト