Zeus: Efficiently Localizing Actions in Videos using Reinforcement Learning

Pramod Chunduri; Jaeho Bang; Yao Lu; Joy Arulraj

Zeus：強化学習を使用してビデオ内のアクションを効率的にローカライズする

ビデオ内のアクションの検出とローカリゼーションは、実際には重要な問題です。交通アナリストは、特定の交差点で車両が移動するパターンの調査に関心があるかもしれません。最先端のビデオ分析システムは、そのようなアクションクエリに効率的かつ効果的に答えることができません。その理由は3つあります。まず、アクションの検出とローカリゼーションのタスクには、計算コストの高いディープニューラルネットワークが必要です。第二に、行動はしばしばまれな出来事です。第三に、アクションは一連のフレームに分散されます。クエリに効果的に答えるためには、フレームのシーケンス全体をコンテキストに取り込むことが重要です。アクションクエリに効率的に回答するには、動画の無関係な部分をすばやく確認することが重要です。このホワイトペーパーでは、アクションクエリに回答するために調整されたビデオ分析システムであるZeusを紹介します。深層強化学習エージェントを使用して、これらのクエリに効率的に答えるための新しい手法を提案します。 Zeusは、入力ビデオセグメントをアクション分類ネットワークに適応的に変更することを学習するエージェントをトレーニングします。エージェントは、サンプリングレート、セグメント長、解像度の3つの次元に沿って入力セグメントを変更します。 Zeusは、効率に加えて、精度を意識した報酬関数に基づいてエージェントをトレーニングするクエリオプティマイザーを使用して、ユーザー指定のターゲット精度でクエリに回答することができます。新しいアクションローカリゼーションデータセットでのZeusの評価は、Zeusが最先端のフレームベースおよびウィンドウベースの手法をそれぞれ最大1.4倍および3倍上回っていることを示しています。さらに、フレームベースの手法とは異なり、フレームベースの手法よりも最大2倍高い精度で、すべてのクエリにわたってユーザー指定のターゲット精度を満たします。

Detection and localization of actions in videos is an important problem in practice. A traffic analyst might be interested in studying the patterns in which vehicles move at a given intersection. State-of-the-art video analytics systems are unable to efficiently and effectively answer such action queries. The reasons are threefold. First, action detection and localization tasks require computationally expensive deep neural networks. Second, actions are often rare events. Third, actions are spread across a sequence of frames. It is important to take the entire sequence of frames into context for effectively answering the query. It is critical to quickly skim through the irrelevant parts of the video to answer the action query efficiently. In this paper, we present Zeus, a video analytics system tailored for answering action queries. We propose a novel technique for efficiently answering these queries using a deep reinforcement learning agent. Zeus trains an agent that learns to adaptively modify the input video segments to an action classification network. The agent alters the input segments along three dimensions -- sampling rate, segment length, and resolution. Besides efficiency, Zeus is capable of answering the query at a user-specified target accuracy using a query optimizer that trains the agent based on an accuracy-aware reward function. Our evaluation of Zeus on a novel action localization dataset shows that it outperforms the state-of-the-art frame- and window-based techniques by up to 1.4x and 3x, respectively. Furthermore, unlike the frame-based technique, it satisfies the user-specified target accuracy across all the queries, at up to 2x higher accuracy, than frame-based methods.

updated: Mon Apr 19 2021 03:20:48 GMT+0000 (UTC)

published: Tue Apr 06 2021 16:38:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト