Adaptive Focus for Efficient Video Recognition

Yulin Wang; Zhaoxi Chen; Haojun Jiang; Shiji Song; Yizeng Han; Gao Huang

効率的なビデオ認識のためのアダプティブフォーカス

この論文では、計算効率を改善することを目的として、ビデオ認識における空間的冗長性を調査します。ビデオの各フレームで最も有益な領域は通常、フレーム間でスムーズにシフトする小さな画像パッチであることが観察されます。したがって、パッチのローカリゼーション問題を順次決定タスクとしてモデル化し、効率的な空間適応ビデオ認識（AdaFocus）のための強化学習ベースのアプローチを提案します。具体的には、軽量のConvNetを最初に採用して、完全なビデオシーケンスを迅速に処理します。この機能は、繰り返し発生するポリシーネットワークで使用され、最もタスクに関連する領域をローカライズします。次に、選択されたパッチは、最終的な予測のために大容量ネットワークによって推測されます。オフライン推論中に、有益なパッチシーケンスが生成されると、計算の大部分を並行して実行でき、最新のGPUデバイスで効率的です。さらに、提案された方法は、時間的冗長性をさらに考慮することによって、たとえば、価値の低いフレームを動的にスキップすることによって、簡単に拡張できることを示します。 5つのベンチマークデータセット、つまり、ActivityNet、FCVID、Mini-Kinetics、Something-Something V1＆V2での広範な実験は、私たちの方法が競合するベースラインよりも大幅に効率的であることを示しています。コードはhttps://github.com/blackfeather-wang/AdaFocusで入手できます。

In this paper, we explore the spatial redundancy in video recognition with the aim to improve the computational efficiency. It is observed that the most informative region in each frame of a video is usually a small image patch, which shifts smoothly across frames. Therefore, we model the patch localization problem as a sequential decision task, and propose a reinforcement learning based approach for efficient spatially adaptive video recognition (AdaFocus). In specific, a light-weighted ConvNet is first adopted to quickly process the full video sequence, whose features are used by a recurrent policy network to localize the most task-relevant regions. Then the selected patches are inferred by a high-capacity network for the final prediction. During offline inference, once the informative patch sequence has been generated, the bulk of computation can be done in parallel, and is efficient on modern GPU devices. In addition, we demonstrate that the proposed method can be easily extended by further considering the temporal redundancy, e.g., dynamically skipping less valuable frames. Extensive experiments on five benchmark datasets, i.e., ActivityNet, FCVID, Mini-Kinetics, Something-Something V1&V2, demonstrate that our method is significantly more efficient than the competitive baselines. Code will be available at https://github.com/blackfeather-wang/AdaFocus.

updated: Fri May 07 2021 13:24:47 GMT+0000 (UTC)

published: Fri May 07 2021 13:24:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト