E^2TAD: An Energy-Efficient Tracking-based Action Detector

Xin Hu; Zhenyu Wu; Hao-Yu Miao; Siqi Fan; Taiyu Long; Zhenyu Hu; Pengcheng Pi; Yi Wu; Zhou Ren; Zhangyang Wang; Gang Hua

E ^ 2TAD：エネルギー効率の高い追跡ベースのアクション検出器

ビデオアクション検出（時空間アクションローカリゼーション）は、通常、今日のビデオの人間中心のインテリジェント分析の開始点です。 Faster R-CNNの2段階のパラダイムは、オブジェクト検出におけるビデオアクション検出の標準パラダイムを刺激します。つまり、最初に人物の提案を生成し、次にアクションを分類します。。ただし、既存のソリューションはどれも、「誰が、いつ、どこで、何を」レベルまで、きめ細かいアクション検出を提供できませんでした。このホワイトペーパーでは、事前定義されたキーアクションを空間的（関連するターゲットIDと場所を予測することにより）および時間的（正確なフレームインデックスで時間を予測することにより）に正確かつ効率的にローカライズするための追跡ベースのソリューションを紹介します。このソリューションは、2021年の低電力コンピュータビジョンチャレンジ（LPCVC）のUAVビデオトラックで1位を獲得しました。

Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays. It has high practical impacts for many applications across robotics, security, healthcare, etc. The two-stage paradigm of Faster R-CNN inspires a standard paradigm of video action detection in object detection, i.e., firstly generating person proposals and then classifying their actions. However, none of the existing solutions could provide fine-grained action detection to the "who-when-where-what" level. This paper presents a tracking-based solution to accurately and efficiently localize predefined key actions spatially (by predicting the associated target IDs and locations) and temporally (by predicting the time in exact frame indices). This solution won first place in the UAV-Video Track of 2021 Low-Power Computer Vision Challenge (LPCVC).

updated: Sun Jun 05 2022 22:47:02 GMT+0000 (UTC)

published: Sat Apr 09 2022 07:52:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト