Faster and Accurate Compressed Video Action Recognition Straight from the Frequency Domain

Samuel Felipe dos Santos; Jurandy Almeida

周波数領域から直接、より高速で正確な圧縮ビデオアクション認識

人間の行動認識は、監視、医療、産業環境、スマートホームなどの幅広いアプリケーションにより、コンピュータビジョンの最も活発な研究分野の1つになっています。最近、ディープラーニングは、ビデオ内の人間の行動を認識するための強力で解釈可能な機能を学習するためにうまく使用されています。既存の深層学習アプローチのほとんどは、ビデオ情報をRGB画像シーケンスとして処理するために設計されています。このため、ビデオデータは圧縮形式で保存されることが多いため、事前のデコードプロセスが必要です。ただし、ビデオをデコードするには、高い計算負荷とメモリ使用量が要求されます。この問題を克服するために、圧縮ビデオから直接学習できるディープニューラルネットワークを提案します。私たちのアプローチは、UCF-101データセットとHMDB-51データセットの2つの公開ベンチマークで評価され、推論速度の点で最大2倍高速になるという利点を備えた、最先端の方法と同等の認識パフォーマンスを示しています。

Human action recognition has become one of the most active field of research in computer vision due to its wide range of applications, like surveillance, medical, industrial environments, smart homes, among others. Recently, deep learning has been successfully used to learn powerful and interpretable features for recognizing human actions in videos. Most of the existing deep learning approaches have been designed for processing video information as RGB image sequences. For this reason, a preliminary decoding process is required, since video data are often stored in a compressed format. However, a high computational load and memory usage is demanded for decoding a video. To overcome this problem, we propose a deep neural network capable of learning straight from compressed video. Our approach was evaluated on two public benchmarks, the UCF-101 and HMDB-51 datasets, demonstrating comparable recognition performance to the state-of-the-art methods, with the advantage of running up to 2 times faster in terms of inference speed.

updated: Sat Dec 26 2020 12:43:53 GMT+0000 (UTC)

published: Sat Dec 26 2020 12:43:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト