Hierarchical Graph-RNNs for Action Detection of Multiple Activities

Sovan Biswas; Yaser Souri; Juergen Gall

階層グラフ-複数のアクティビティのアクション検出のためのRNN

本論文では、一人一人が同時に複数の活動を行うことができるビデオフレーム内の活動を空間的にローカライズするアプローチを提案します。私たちのアプローチは、時間的シーンのコンテキストと、検出された人物の行動の関係を考慮に入れています。時間的コンテキストは時間的リカレントニューラルネットワーク（RNN）によってモデル化されますが、アクションの関係はグラフRNNによってモデル化されます。両方のネットワークが一緒にトレーニングされ、提案されたアプローチはAVAデータセットで最先端の結果を達成します。

In this paper, we propose an approach that spatially localizes the activities in a video frame where each person can perform multiple activities at the same time. Our approach takes the temporal scene context as well as the relations of the actions of detected persons into account. While the temporal context is modeled by a temporal recurrent neural network (RNN), the relations of the actions are modeled by a graph RNN. Both networks are trained together and the proposed approach achieves state of the art results on the AVA dataset.

updated: Thu Jan 21 2021 12:50:02 GMT+0000 (UTC)

published: Thu Jan 21 2021 12:50:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト