Weakly-guided Self-supervised Pretraining for Temporal Activity Detection

Kumara Kahatapitiya; Zhou Ren; Haoxiang Li; Zhenyu Wu; Michael S. Ryoo; Gang Hua

一時的なアクティビティ検出のための弱いガイド付き自己教師付き事前トレーニング

時間的アクティビティ検出は、アクティビティ分類 (つまり、アクティビティ認識) におけるビデオレベルの予測とは対照的に、フレームごとのアクティビティクラスを予測することを目的としています。検出に必要な高価なフレームレベルの注釈により、検出データセットの規模は制限されます。したがって、通常、一時的なアクティビティ検出に関する以前の研究では、大規模な分類データセット (Kinetics-400 など) で事前トレーニングされた分類モデルを微調整することに頼っています。ただし、このような事前トレーニング済みのモデルは、事前トレーニングとダウンストリームの微調整タスクの間に差異があるため、ダウンストリームの検出には理想的ではありません。この作業では、検出のための新しい「弱誘導自己教師あり」事前トレーニング方法を提案します。弱いラベル (分類) を活用して、フレームレベルの疑似ラベル、マルチアクションフレーム、およびアクションセグメントを生成することにより、自己教師ありの口実タスク (検出) を導入します。簡単に言えば、追加の注釈なしで、大規模な分類データに対して、ダウンストリームと同様の検出タスクを設計します。提案された弱く誘導された自己教師付き検出タスクで事前トレーニングされたモデルは、Charades や MultiTHUMOS を含む複数の挑戦的なアクティビティ検出ベンチマークでの以前の作業よりも優れていることを示しています。私たちの広範なアブレーションは、活動検出のために提案されたモデルをいつ、どのように使用するかについての洞察をさらに提供します。コードは https://github.com/kkahatapitiya/SSDet で入手できます。

Temporal Activity Detection aims to predict activity classes per frame, in contrast to video-level predictions in Activity Classification (i.e., Activity Recognition). Due to the expensive frame-level annotations required for detection, the scale of detection datasets is limited. Thus, commonly, previous work on temporal activity detection resorts to fine-tuning a classification model pretrained on large-scale classification datasets (e.g., Kinetics-400). However, such pretrained models are not ideal for downstream detection, due to the disparity between the pretraining and the downstream fine-tuning tasks. In this work, we propose a novel 'weakly-guided self-supervised' pretraining method for detection. We leverage weak labels (classification) to introduce a self-supervised pretext task (detection) by generating frame-level pseudo labels, multi-action frames, and action segments. Simply put, we design a detection task similar to downstream, on large-scale classification data, without extra annotations. We show that the models pretrained with the proposed weakly-guided self-supervised detection task outperform prior work on multiple challenging activity detection benchmarks, including Charades and MultiTHUMOS. Our extensive ablations further provide insights on when and how to use the proposed models for activity detection. Code is available at https://github.com/kkahatapitiya/SSDet.

updated: Sun Feb 05 2023 00:12:36 GMT+0000 (UTC)

published: Fri Nov 26 2021 18:59:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト