TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition

Ishan Rajendrakumar Dave; Mamshad Nayeem Rizve; Chen Chen; Mubarak Shah

TimeBalance: 半教師付きアクション認識のための一時的に不変で一時的に異なるビデオ表現

半教師あり学習は、注釈のコストと次元が高いため、画像と比較してビデオドメインにとってより有益です。その上、ビデオ理解タスクには、空間的次元と時間的次元の両方についての推論が必要です。半教師付きアクション認識タスクの静的およびモーション関連機能の両方を学習するために、既存の方法は、2 つのモダリティ (RGB およびオプティカルフロー) または異なる再生レートの 2 つのストリームを使用するなど、ハード入力誘導バイアスに依存しています。さまざまな入力ストリームを介してラベルのないビデオを利用する代わりに、自己監視型のビデオ表現に依存しています。特に、時間的に不変で時間的に特徴的な表現を利用しています。これらの表現は、アクションの性質に応じて互いに補完し合うことがわかります。この観察に基づいて、学生と教師の半教師付き学習フレームワークである TimeBalance を提案します。ここでは、時間的に不変で時間的に異なる教師から知識を抽出します。ラベル付けされていないビデオの性質に応じて、新しい時間的類似性に基づく再重み付けスキームに基づいて、これら 2 人の教師の知識を動的に組み合わせます。私たちの方法は、UCF101、HMDB51、および Kinetics400 の 3 つのアクション認識ベンチマークで最先端のパフォーマンスを達成します。コード: https://github.com/DAVEISHAN/TimeBalance

Semi-Supervised Learning can be more beneficial for the video domain compared to images because of its higher annotation cost and dimensionality. Besides, any video understanding task requires reasoning over both spatial and temporal dimensions. In order to learn both the static and motion related features for the semi-supervised action recognition task, existing methods rely on hard input inductive biases like using two-modalities (RGB and Optical-flow) or two-stream of different playback rates. Instead of utilizing unlabeled videos through diverse input streams, we rely on self-supervised video representations, particularly, we utilize temporally-invariant and temporally-distinctive representations. We observe that these representations complement each other depending on the nature of the action. Based on this observation, we propose a student-teacher semi-supervised learning framework, TimeBalance, where we distill the knowledge from a temporally-invariant and a temporally-distinctive teacher. Depending on the nature of the unlabeled video, we dynamically combine the knowledge of these two teachers based on a novel temporal similarity-based reweighting scheme. Our method achieves state-of-the-art performance on three action recognition benchmarks: UCF101, HMDB51, and Kinetics400. Code: https://github.com/DAVEISHAN/TimeBalance

updated: Tue Mar 28 2023 19:28:54 GMT+0000 (UTC)

published: Tue Mar 28 2023 19:28:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト