ABN: Agent-Aware Boundary Networks for Temporal Action Proposal Generation

Khoa Vo; Kashu Yamazaki; Sang Truong; Minh-Triet Tran; Akihiro Sugimoto; Ngan Le

ABN：一時的なアクション提案を生成するためのエージェント対応境界ネットワーク

時間的アクション提案生成（TAPG）は、トリミングされていないビデオのアクションの時間的間隔を推定することを目的としています。これは困難ですが、ビデオ分析と理解の多くのタスクで重要な役割を果たします。 TAPGでの大きな成果にもかかわらず、ほとんどの既存の作品は、トリミングされていないビデオにブラックボックスとしてディープラーニングモデルを適用してビデオの視覚的表現を抽出することにより、エージェントと周囲の環境との間の相互作用の人間の知覚を無視しています。したがって、エージェントと環境の間のこれらの相互作用をキャプチャできれば、TAPGのパフォーマンスは有益であり、潜在的に向上します。この論文では、エージェント認識境界ネットワーク（ABN）という名前の新しいフレームワークを提案します。これは、2つのサブネットワーク（i）エージェント認識表現ネットワークで構成され、ビデオ表現でエージェント-エージェントとエージェント-環境の両方の関係を取得します。、および（ii）時間間隔の信頼スコアを推定するための境界生成ネットワーク。エージェントアウェア表現ネットワークでは、エージェント間の相互作用は、エージェントの動きに焦点を当てるためにローカルレベルで動作するローカル経路を介して表現されますが、周囲の全体的な知覚は、グローバルレベルで動作するグローバル経路を介して表現されますエージェントの影響を認識する-環境。異なるバックボーンネットワーク（つまり、C3D、SlowFast、Two-Stream）を使用した20アクションTHUMOS-14および200アクションActivityNet-1.3データセットの包括的な評価は、提案されたABNが、採用されている方法に関係なく、最先端の方法を確実に上回っていることを示しています。 TAPGのバックボーンネットワーク。さらに、私たちの方法で生成された提案を時間的アクション検出（TAD）フレームワークに活用することで提案の品質を調べ、それらの検出パフォーマンスを評価します。ソースコードは、このURLhttps://github.com/vhvkhoa/TAPG-AgentEnvNetwork.gitにあります。

Temporal action proposal generation (TAPG) aims to estimate temporal intervals of actions in untrimmed videos, which is a challenging yet plays an important role in many tasks of video analysis and understanding. Despite the great achievement in TAPG, most existing works ignore the human perception of interaction between agents and the surrounding environment by applying a deep learning model as a black-box to the untrimmed videos to extract video visual representation. Therefore, it is beneficial and potentially improve the performance of TAPG if we can capture these interactions between agents and the environment. In this paper, we propose a novel framework named Agent-Aware Boundary Network (ABN), which consists of two sub-networks (i) an Agent-Aware Representation Network to obtain both agent-agent and agents-environment relationships in the video representation, and (ii) a Boundary Generation Network to estimate the confidence score of temporal intervals. In the Agent-Aware Representation Network, the interactions between agents are expressed through local pathway, which operates at a local level to focus on the motions of agents whereas the overall perception of the surroundings are expressed through global pathway, which operates at a global level to perceive the effects of agents-environment. Comprehensive evaluations on 20-action THUMOS-14 and 200-action ActivityNet-1.3 datasets with different backbone networks (i.e C3D, SlowFast and Two-Stream) show that our proposed ABN robustly outperforms state-of-the-art methods regardless of the employed backbone network on TAPG. We further examine the proposal quality by leveraging proposals generated by our method onto temporal action detection (TAD) frameworks and evaluate their detection performances. The source code can be found in this URL https://github.com/vhvkhoa/TAPG-AgentEnvNetwork.git.

updated: Wed Mar 16 2022 21:06:34 GMT+0000 (UTC)

published: Wed Mar 16 2022 21:06:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト