Delving into 3D Action Anticipation from Streaming Videos

Hongsong Wang; Jiashi Feng

ストリーミングビデオからの 3D アクションの期待を掘り下げる

部分的な観察で行動を認識することを目的とした行動予測は、応用範囲が広いため、ますます人気が高まっています。この論文では、この問題を解決するためのベストプラクティスを理解することを目的として、ストリーミングビデオからの 3D アクションの予測の問題を調査します。まず、いくつかの補完的な評価指標を導入し、フレームごとのアクション分類に基づいた基本モデルを提示します。より良いパフォーマンスを達成するために、次に 2 つの重要な要素、つまりトレーニングクリップの長さとクリップのサンプリング方法を調査します。また、完全なアクション表現とクラスに依存しないアクションラベルという 2 つの側面から補助情報を組み込むことにより、マルチタスク学習戦略を探索します。私たちの包括的な実験により、3D アクション予測のベストプラクティスが明らかになり、それに応じてマルチタスク損失を伴う新しい方法が提案されます。提案された手法は最近の手法を大幅に上回り、標準ベンチマークで最先端のパフォーマンスを示します。

Action anticipation, which aims to recognize the action with a partial observation, becomes increasingly popular due to a wide range of applications. In this paper, we investigate the problem of 3D action anticipation from streaming videos with the target of understanding best practices for solving this problem. We first introduce several complementary evaluation metrics and present a basic model based on frame-wise action classification. To achieve better performance, we then investigate two important factors, i.e., the length of the training clip and clip sampling method. We also explore multi-task learning strategies by incorporating auxiliary information from two aspects: the full action representation and the class-agnostic action label. Our comprehensive experiments uncover the best practices for 3D action anticipation, and accordingly we propose a novel method with a multi-task loss. The proposed method considerably outperforms the recent methods and exhibits the state-of-the-art performance on standard benchmarks.

updated: Thu Jun 15 2023 00:09:45 GMT+0000 (UTC)

published: Sat Jun 15 2019 10:30:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト