Multi-Modal Temporal Convolutional Network for Anticipating Actions in Egocentric Videos

Olga Zatsarynna; Yazan Abu Farha; Juergen Gall

自己中心的なビデオのアクションを予測するためのマルチモーダル時間畳み込みネットワーク

自動運転車やロボットアシスタントなど、信頼性の高いインテリジェントエージェントを開発するには、人間の行動を予測することが重要なタスクです。予測アプローチを設計するには、将来の予測を高精度で行う能力が重要ですが、推論が実行される速度もそれほど重要ではありません。正確であるが十分に高速ではない方法は、意思決定プロセスに高い待ち時間をもたらします。したがって、これにより、基礎となるシステムの反応時間が長くなります。これは、自動運転など、反応時間が重要なドメインで問題を引き起こします。この作業では、時間畳み込みに基づくシンプルで効果的なマルチモーダルアーキテクチャを提案します。私たちのアプローチは、時間畳み込み層の階層を積み重ね、高速予測を保証するために反復層に依存しません。さらに、RGB、フロー、およびオブジェクトのモダリティ間のペアワイズ相互作用をキャプチャするマルチモーダル融合メカニズムを紹介します。自己中心的なビデオの2つの大規模なデータセット、EPIC-Kitchens-55とEPIC-Kitchens-100の結果は、私たちのアプローチが最先端のアプローチと同等のパフォーマンスを実現しながら、大幅に高速化することを示しています。

Anticipating human actions is an important task that needs to be addressed for the development of reliable intelligent agents, such as self-driving cars or robot assistants. While the ability to make future predictions with high accuracy is crucial for designing the anticipation approaches, the speed at which the inference is performed is not less important. Methods that are accurate but not sufficiently fast would introduce a high latency into the decision process. Thus, this will increase the reaction time of the underlying system. This poses a problem for domains such as autonomous driving, where the reaction time is crucial. In this work, we propose a simple and effective multi-modal architecture based on temporal convolutions. Our approach stacks a hierarchy of temporal convolutional layers and does not rely on recurrent layers to ensure a fast prediction. We further introduce a multi-modal fusion mechanism that captures the pairwise interactions between RGB, flow, and object modalities. Results on two large-scale datasets of egocentric videos, EPIC-Kitchens-55 and EPIC-Kitchens-100, show that our approach achieves comparable performance to the state-of-the-art approaches while being significantly faster.

updated: Sun Jul 18 2021 16:21:35 GMT+0000 (UTC)

published: Sun Jul 18 2021 16:21:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト