SlowFast Rolling-Unrolling LSTMs for Action Anticipation in Egocentric Videos

Nada Osman; Guglielmo Camporese; Pasquale Coscia; Lamberto Ballan

SlowFastRolling-自己中心的なビデオでの行動予測のためのLSTMの展開

自己中心的なビデオでの行動予測は、人間の行動の本質的にマルチモーダルな性質のために困難な作業です。さらに、一部のアクションは、アクターまたは周囲のコンテキストに応じて、他のアクションよりも速くまたは遅く発生します。これは、毎回変化し、異なる予測につながる可能性があります。このアイデアに基づいて、人間の行動を予測するために特別に設計されたRULSTMアーキテクチャに基づいて構築し、RGB、オプティカルフロー、および3つの異なるモダリティから抽出された低速および高速の特徴を同時に評価する新しい注意ベースの手法を提案します。抽出されたオブジェクト。 2つのブランチは、異なる時間スケール、つまりフレームレートで情報を処理し、予測精度を向上させるためにいくつかの融合スキームが考慮されます。 EpicKitchens-55およびEGTEAGaze +データセットで広範な実験を実行し、私たちの手法がさまざまな予測時間でトップ5精度メトリックのRULSTMアーキテクチャの結果を体系的に改善することを示します。

Action anticipation in egocentric videos is a difficult task due to the inherently multi-modal nature of human actions. Additionally, some actions happen faster or slower than others depending on the actor or surrounding context which could vary each time and lead to different predictions. Based on this idea, we build upon RULSTM architecture, which is specifically designed for anticipating human actions, and propose a novel attention-based technique to evaluate, simultaneously, slow and fast features extracted from three different modalities, namely RGB, optical flow, and extracted objects. Two branches process information at different time scales, i.e., frame-rates, and several fusion schemes are considered to improve prediction accuracy. We perform extensive experiments on EpicKitchens-55 and EGTEA Gaze+ datasets, and demonstrate that our technique systematically improves the results of RULSTM architecture for Top-5 accuracy metric at different anticipation times.

updated: Thu Sep 02 2021 10:20:18 GMT+0000 (UTC)

published: Thu Sep 02 2021 10:20:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト