Semi-Supervised Temporal Action Detection with Proposal-Free Masking

Sauradip Nag; Xiatian Zhu; Yi-Zhe Song; Tao Xiang

提案なしのマスキングによる半教師あり時間的行動検出

既存の時間的行動検出（TAD）メソッドは、セグメントレベルの注釈付きの多数のトレーニングデータに依存しています。したがって、そのようなトレーニングセットを収集して注釈を付けることは、非常に費用がかかり、拡張性がありません。半教師ありTAD（SS-TAD）は、大規模に自由に利用できるラベルのないビデオを活用することで、この問題を軽減します。ただし、SS-TADは、監視対象のTADよりもはるかに困難な問題であり、その結果、十分に研究されていません。以前のSS-TADメソッドは、既存のプロポーザルベースのTADメソッドとSSLメソッドを直接組み合わせたものです。それらの順次ローカリゼーション（例えば、提案の生成）および分類設計のために、それらは提案エラーの伝播を起こしやすい。この制限を克服するために、この作業では、並列ローカリゼーション（マスク生成）と分類アーキテクチャを備えたPropOsal-free Temporal mask（SPOT）に基づく新しい半教師あり時間アクション検出モデルを提案します。このような斬新な設計は、その間のエラー伝播のルートを遮断することにより、ローカリゼーションと分類の間の依存関係を効果的に排除します。さらに、予測の改良のための分類とローカリゼーションの間の相互作用メカニズム、および自己教師ありモデルの事前トレーニングのための新しい口実タスクを紹介します。 2つの標準ベンチマークでの広範な実験により、SPOTは、多くの場合、大幅に、最先端の代替手段よりも優れていることが示されています。 SPOTのPyTorch実装は、https：//github.com/sauradip/SPOTで入手できます。

Existing temporal action detection (TAD) methods rely on a large number of training data with segment-level annotations. Collecting and annotating such a training set is thus highly expensive and unscalable. Semi-supervised TAD (SS-TAD) alleviates this problem by leveraging unlabeled videos freely available at scale. However, SS-TAD is also a much more challenging problem than supervised TAD, and consequently much under-studied. Prior SS-TAD methods directly combine an existing proposal-based TAD method and a SSL method. Due to their sequential localization (e.g, proposal generation) and classification design, they are prone to proposal error propagation. To overcome this limitation, in this work we propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) with a parallel localization (mask generation) and classification architecture. Such a novel design effectively eliminates the dependence between localization and classification by cutting off the route for error propagation in-between. We further introduce an interaction mechanism between classification and localization for prediction refinement, and a new pretext task for self-supervised model pre-training. Extensive experiments on two standard benchmarks show that our SPOT outperforms state-of-the-art alternatives, often by a large margin. The PyTorch implementation of SPOT is available at https://github.com/sauradip/SPOT

updated: Thu Jul 14 2022 16:58:47 GMT+0000 (UTC)

published: Thu Jul 14 2022 16:58:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト