Post-Processing Temporal Action Detection

Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang

後処理の時間アクション検出

既存の時間アクション検出 (TAD) メソッドは通常、時間境界推定とアクション分類の前に、入力可変長ビデオを固定長スニペット表現シーケンスに変換する前処理ステップを実行します。この前処理ステップは、ビデオを一時的にダウンサンプリングし、推論解像度を低下させ、元の時間解像度での検出パフォーマンスを妨げます。本質的に、これは解像度のダウンサンプリングと回復中に導入された一時的な量子化エラーによるものです。これは TAD のパフォーマンスに悪影響を及ぼす可能性がありますが、既存の方法ではほとんど無視されています。この問題に対処するために、この作業では、モデルの再設計と再トレーニングを行わない、モデルに依存しない新しい後処理方法を紹介します。具体的には、アクションインスタンスの開始点と終了点をガウス分布でモデル化し、サブスニペットレベルで時間境界の推論を可能にします。さらに、ガウス近似後処理 (GAP) と呼ばれる効率的なテイラー展開ベースの近似を導入します。広範な実験により、当社の GAP が、困難な ActivityNet (平均 mAP で +0.2% -0.7%) および THUMOS (平均 mAP で +0.2% -0.5%) で、さまざまな事前トレーニング済みの市販の TAD モデルを一貫して改善できることが実証されています。 ) ベンチマーク。このようなパフォーマンスの向上はすでに重要であり、新しいモデル設計によって達成されたものに匹敵します。また、GAP をモデルトレーニングと統合して、パフォーマンスをさらに向上させることもできます。重要なことに、GAP はより効率的な推論のために低い時間解像度を可能にし、リソースの少ないアプリケーションを促進します。コードは https://github.com/sauradip/GAP で入手できます

Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. In essence, this is due to a temporal quantization error introduced during the resolution downsampling and recovery. This could negatively impact the TAD performance, but is largely ignored by existing methods. To address this problem, in this work we introduce a novel model-agnostic post-processing method without model redesign and retraining. Specifically, we model the start and end points of action instances with a Gaussian distribution for enabling temporal boundary inference at a sub-snippet level. We further introduce an efficient Taylor-expansion based approximation, dubbed as Gaussian Approximated Post-processing (GAP). Extensive experiments demonstrate that our GAP can consistently improve a wide variety of pre-trained off-the-shelf TAD models on the challenging ActivityNet (+0.2% -0.7% in average mAP) and THUMOS (+0.2% -0.5% in average mAP) benchmarks. Such performance gains are already significant and highly comparable to those achieved by novel model designs. Also, GAP can be integrated with model training for further performance gain. Importantly, GAP enables lower temporal resolutions for more efficient inference, facilitating low-resource applications. The code will be available in https://github.com/sauradip/GAP

updated: Fri Mar 03 2023 07:23:20 GMT+0000 (UTC)

published: Sun Nov 27 2022 19:50:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト