AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

Xijun Wang; Ruiqi Xian; Tianrui Guan; Celso M. de Melo; Stephen M. Nogar; Aniket Bera; Dinesh Manocha

AZTR: 自動ズームと時間推論による空中ビデオアクション認識

航空ビデオアクション認識のための新しいアプローチを提案します。私たちの方法は、UAV を使用してキャプチャされたビデオ用に設計されており、エッジまたはモバイルデバイスで実行できます。カスタマイズされた自動ズームを使用して人間のターゲットを自動的に識別し、適切にスケーリングする学習ベースのアプローチを紹介します。これにより、重要な特徴を抽出しやすくなり、計算オーバーヘッドが削減されます。また、制御可能な計算コスト内で空間的および時間的ドメインに沿ってアクション情報を取得するための効率的な時間的推論アルゴリズムも提示します。私たちのアプローチは、ハイエンド GPU を搭載したデスクトップと、ロボットとドローン用の低電力ロボティクス RB5 プラットフォームの両方で実装および評価されています。実際には、RoCoG-v2 データセットのトップ 1 精度で SOTA より 6.1 ～ 7.4% 向上し、UAV-Human データセットで 8.3 ～ 10.4% 向上し、Drone Action データセットで 3.2% 向上しています。

We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also present an efficient temporal reasoning algorithm to capture the action information along the spatial and temporal domains within a controllable computational cost. Our approach has been implemented and evaluated both on the desktop with high-end GPUs and on the low power Robotics RB5 Platform for robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in Top-1 accuracy on the RoCoG-v2 dataset, 8.3-10.4% improvement on the UAV-Human dataset and 3.2% improvement on the Drone Action dataset.

updated: Thu Mar 02 2023 21:24:19 GMT+0000 (UTC)

published: Thu Mar 02 2023 21:24:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト