Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling

Guiqin Wang; Peng Zhao; Cong Zhao; Shusen Yang; Jie Cheng; Luziwei Leng; Jianxing Liao; Qinghai Guo

階層構造の潜在的注意モデリングによる弱い教師ありアクションの位置特定

弱監視アクションローカリゼーションは、ビデオレベルのラベルのみを持つトリミングされていないビデオ内のアクションインスタンスを認識してローカライズすることを目的としています。既存のモデルのほとんどは、ラベルのないインスタンスの予測がラベル付きバッグを分類することによって監視される複数インスタンス学習 (MIL) に依存しています。 MIL ベースの手法は比較的よく研究されており、位置特定ではなく分類では説得力のあるパフォーマンスが達成されています。一般に、ビデオレベルの分類によって時間的領域を特定しますが、特徴のセマンティクスの時間的変化は見落とされます。この問題に対処するために、特徴セマンティクスの時間的変動を学習するための新しい注意ベースの階層構造潜在モデルを提案します。具体的には、私たちのモデルには 2 つのコンポーネントが含まれています。1 つ目は、変化率に基づいて時間階層内のビデオ特徴の潜在的な表現を学習することで変化点を検出する教師なし変化点検出モジュールです。2 つ目は注意ベースのモジュールです。前景の変化点を境界として選択する分類モデル。モデルの有効性を評価するために、THUMOS-14 と ActivityNet-v1.3 の 2 つのベンチマークデータセットで広範な実験を実施しました。実験では、私たちの方法が現在の最先端の方法よりも優れており、完全に監視された方法と同等のパフォーマンスを達成することさえ示しています。

Weakly-supervised action localization aims to recognize and localize action instancese in untrimmed videos with only video-level labels. Most existing models rely on multiple instance learning(MIL), where the predictions of unlabeled instances are supervised by classifying labeled bags. The MIL-based methods are relatively well studied with cogent performance achieved on classification but not on localization. Generally, they locate temporal regions by the video-level classification but overlook the temporal variations of feature semantics. To address this problem, we propose a novel attention-based hierarchically-structured latent model to learn the temporal variations of feature semantics. Specifically, our model entails two components, the first is an unsupervised change-points detection module that detects change-points by learning the latent representations of video features in a temporal hierarchy based on their rates of change, and the second is an attention-based classification model that selects the change-points of the foreground as the boundaries. To evaluate the effectiveness of our model, we conduct extensive experiments on two benchmark datasets, THUMOS-14 and ActivityNet-v1.3. The experiments show that our method outperforms current state-of-the-art methods, and even achieves comparable performance with fully-supervised methods.

updated: Sat Aug 19 2023 08:45:49 GMT+0000 (UTC)

published: Sat Aug 19 2023 08:45:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト