Attention Distillation for Learning Video Representations

Miao Liu; Xin Chen; Yun Zhang; Yin Li; James M. Rehg

ビデオ表現を学習するための注意蒸留

ビデオ認識の深いモデルを使用してモーション表現を学習するという難しい問題に対処します。この目的のために、動画の領域を強調表示し、認識のために機能を集約することを学習する注意モジュールを利用します。具体的には、学習された表現をモーション（フロー）ネットワークからRGBネットワークに転送する手段として、出力アテンションマップを活用することを提案します。注意モジュールの設計を体系的に研究し、注意蒸留の新しい方法を開発します。私たちの方法は主要なアクションベンチマークで評価され、ベースラインRGBネットワークのパフォーマンスを大幅に改善しています。さらに、アテンションマップがモーションキューを学習に活用して、ビデオフレーム内のアクションの場所を特定できることを示しています。私たちの方法は、深いモデルでのモーション認識表現の学習に向けたステップを提供すると考えています。プロジェクトページはhttps://aptx4869lm.github.io/AttentionDistillation/にあります。

We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network. We systematically study the design of attention modules, and develop a novel method for attention distillation. Our method is evaluated on major action benchmarks, and consistently improves the performance of the baseline RGB network by a significant margin. Moreover, we demonstrate that our attention maps can leverage motion cues in learning to identify the location of actions in video frames. We believe our method provides a step towards learning motion-aware representations in deep models. Our project page is available at https://aptx4869lm.github.io/AttentionDistillation/

updated: Fri Aug 14 2020 19:42:37 GMT+0000 (UTC)

published: Fri Apr 05 2019 19:43:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト