Fine-grained Temporal Contrastive Learning for Weakly-supervised Temporal Action Localization

Junyu Gao; Mengyuan Chen; Changsheng Xu

弱教師あり時間的行動ローカリゼーションのためのきめ細かい時間的対照学習

モデルトレーニング中にビデオレベルのアクションラベルのみが使用可能である、弱教師ありアクションローカリゼーション（WSAL）のタスクを対象としています。最近の進歩にもかかわらず、既存の方法は主に分類ごとのローカリゼーションパラダイムを採用し、ビデオシーケンス間の実り多いきめ細かい時間的区別を見落としているため、分類学習と分類からローカリゼーションへの適応における深刻なあいまいさに苦しんでいます。この論文は、シーケンス間の区別を文脈的に比較することによる学習は、WSALに本質的な誘導バイアスを提供し、コヒーレントなアクションインスタンスを識別するのに役立つと主張しています。具体的には、微分動的計画法の定式化の下で、細粒度シーケンス距離（FSD）コントラストと最長共通部分列（LCS）コントラストを含む、2つの補完的な対照的な目的が設計されます。一致、挿入、削除の演算子と2番目の演算子は、2つのビデオ間の最長共通部分列をマイニングします。両方の対照的なモジュールは、お互いを強化し、識別アクションのメリットを共同で楽しむことができます-背景の分離と分類とローカリゼーションの間のタスクギャップの軽減。広範な実験は、私たちの方法が2つの人気のあるベンチマークで最先端のパフォーマンスを達成することを示しています。私たちのコードはhttps://github.com/MengyuanChen21/CVPR2022-FTCLで入手できます。

We target at the task of weakly-supervised action localization (WSAL), where only video-level action labels are available during model training. Despite the recent progress, existing methods mainly embrace a localization-by-classification paradigm and overlook the fruitful fine-grained temporal distinctions between video sequences, thus suffering from severe ambiguity in classification learning and classification-to-localization adaption. This paper argues that learning by contextually comparing sequence-to-sequence distinctions offers an essential inductive bias in WSAL and helps identify coherent action instances. Specifically, under a differentiable dynamic programming formulation, two complementary contrastive objectives are designed, including Fine-grained Sequence Distance (FSD) contrasting and Longest Common Subsequence (LCS) contrasting, where the first one considers the relations of various action/background proposals by using match, insert, and delete operators and the second one mines the longest common subsequences between two videos. Both contrasting modules can enhance each other and jointly enjoy the merits of discriminative action-background separation and alleviated task gap between classification and localization. Extensive experiments show that our method achieves state-of-the-art performance on two popular benchmarks. Our code is available at https://github.com/MengyuanChen21/CVPR2022-FTCL.

updated: Thu Mar 31 2022 05:13:50 GMT+0000 (UTC)

published: Thu Mar 31 2022 05:13:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト