MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition

Ruiqi Xian; Xijun Wang; Dinesh Manocha

MITFAS: 空中ビデオアクション認識のための相互情報ベースの時間的特徴のアライメントとサンプリング

UAV ビデオのアクション認識のための新しいアプローチを提示します。私たちの定式化は、UAV の動きによって引き起こされるオクルージョンと視点の変化を処理するように設計されています。相互情報の概念を使用して、時間領域での人間の行動または動きに対応する領域を計算して整列させます。これにより、認識モデルがモーションに関連する主要な特徴から学習できるようになります。また、共同相互情報を使用して UAV ビデオで最も有益なフレームシーケンスを取得する新しいフレームサンプリング方法を提案します。アプローチを X3D と統合し、複数のデータセットでパフォーマンスを評価しました。実際には、UAV-Human (Li et al., 2021) の現在の最先端の方法よりもトップ 1 精度で 18.9% の改善を達成し、Drone-Action (Perera et al., 2019) で 7.3% の改善を達成しています。 )、および NEC Drones の 7.16% の改善 (Choi et al., 2020)。公開時にコードを公開します

We present a novel approach for action recognition in UAV videos. Our formulation is designed to handle occlusion and viewpoint changes caused by the movement of a UAV. We use the concept of mutual information to compute and align the regions corresponding to human action or motion in the temporal domain. This enables our recognition model to learn from the key features associated with the motion. We also propose a novel frame sampling method that uses joint mutual information to acquire the most informative frame sequence in UAV videos. We have integrated our approach with X3D and evaluated the performance on multiple datasets. In practice, we achieve 18.9% improvement in Top-1 accuracy over current state-of-the-art methods on UAV-Human(Li et al., 2021), 7.3% improvement on Drone-Action(Perera et al., 2019), and 7.16% improvement on NEC Drones(Choi et al., 2020). We will release the code at the time of publication

updated: Sun Mar 05 2023 04:05:17 GMT+0000 (UTC)

published: Sun Mar 05 2023 04:05:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト