Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization

Huan Ren; Wenfei Yang; Tianzhu Zhang; Yongdong Zhang

弱い教師による時間的アクションの位置特定のための提案ベースの複数インスタンス学習

弱教師付き時間アクション位置特定は、トレーニング中にビデオレベルのカテゴリラベルのみを使用してトリミングされていないビデオ内のアクションを位置特定し、認識することを目的としています。インスタンスレベルのアノテーションがなければ、既存のメソッドのほとんどはセグメントベースの複数インスタンス学習 (S-MIL) フレームワークに従い、セグメントの予測はビデオのラベルによって管理されます。ただし、トレーニング中にセグメントレベルのスコアを取得するという目標は、テスト中に提案レベルのスコアを取得するという目標と一致していないため、次善の結果が得られます。この問題に対処するために、トレーニング段階とテスト段階の両方で候補提案を直接分類する新しい提案ベースの複数インスタンス学習 (P-MIL) フレームワークを提案します。これには、次の 3 つの主要な設計が含まれます。 1) 周囲の対照的な特徴抽出モジュール周囲の対照的な情報を考慮して差別的な短い提案を抑制する、2) 完全性擬似ラベルのガイダンスにより低品質の提案を抑制する提案完全性評価モジュール、および 3) 堅牢な検出を実現するインスタンスレベルのランク一貫性損失RGB モダリティと FLOW モダリティの相補性を活用することによって。 THUMOS14 と ActivityNet を含む 2 つの挑戦的なベンチマークに関する広範な実験結果は、私たちの手法の優れたパフォーマンスを実証しています。

Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training. Without instance-level annotations, most existing methods follow the Segment-based Multiple Instance Learning (S-MIL) framework, where the predictions of segments are supervised by the labels of videos. However, the objective for acquiring segment-level scores during training is not consistent with the target for acquiring proposal-level scores during testing, leading to suboptimal results. To deal with this problem, we propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages, which includes three key designs: 1) a surrounding contrastive feature extraction module to suppress the discriminative short proposals by considering the surrounding contrastive information, 2) a proposal completeness evaluation module to inhibit the low-quality proposals with the guidance of the completeness pseudo labels, and 3) an instance-level rank consistency loss to achieve robust detection by leveraging the complementarity of RGB and FLOW modalities. Extensive experimental results on two challenging benchmarks including THUMOS14 and ActivityNet demonstrate the superior performance of our method.

updated: Mon May 29 2023 02:48:04 GMT+0000 (UTC)

published: Mon May 29 2023 02:48:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト