Temporal Alignment Prediction for Few-Shot Video Classification

Fei Pan; Chunlei Xu; Jie Guo; Yanwen Guo

少数ショットのビデオ分類のための時間的アラインメント予測

数ショットのビデオ分類の目標は、少数のラベル付きビデオのみでトレーニングされたときに、優れた一般化能力を備えた分類モデルを学習することです。ただし、このような設定では、ビデオの識別機能表現を学習することは困難です。本論文では、数ショットのビデオ分類のためのシーケンス類似性学習に基づく時間的アラインメント予測（TAP）を提案した。ビデオのペアの類似性を取得するために、時間的アライメント予測機能を使用して、2つのビデオの時間的位置のすべてのペア間のアライメントスコアを予測します。さらに、この関数への入力には、時間領域のコンテキスト情報も備わっています。 KineticsとSomething-SomethingV2を含む2つのビデオ分類ベンチマークでTAPを評価します。実験結果は、TAPの有効性を検証し、最先端の方法に対するその優位性を示しています。

The goal of few-shot video classification is to learn a classification model with good generalization ability when trained with only a few labeled videos. However, it is difficult to learn discriminative feature representations for videos in such a setting. In this paper, we propose Temporal Alignment Prediction (TAP) based on sequence similarity learning for few-shot video classification. In order to obtain the similarity of a pair of videos, we predict the alignment scores between all pairs of temporal positions in the two videos with the temporal alignment prediction function. Besides, the inputs to this function are also equipped with the context information in the temporal domain. We evaluate TAP on two video classification benchmarks including Kinetics and Something-Something V2. The experimental results verify the effectiveness of TAP and show its superiority over state-of-the-art methods.

updated: Mon Jul 26 2021 05:12:27 GMT+0000 (UTC)

published: Mon Jul 26 2021 05:12:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト