A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark

Zhenxi Zhu; Limin Wang; Sheng Guo; Gangshan Wu

いくつかのショットのビデオ分類の詳細：新しいベースラインとベンチマーク

既存の数ショットのビデオ分類方法は、類似性計算用にカスタマイズされた時間的位置合わせモジュールを設計することにより、メタ学習パラダイムを採用することがよくあります。大きな進歩が見られましたが、これらの方法は効果的な表現の学習に焦点を当てることができず、セマンティクスの重複のために数ショットの認識設定には不合理である可能性があるImageNetの事前トレーニングに大きく依存しています。この論文では、3つの貢献をすることにより、数ショットのビデオ分類に関する詳細な研究を提示することを目指しています。まず、既存のメトリックベースの方法で一貫した比較研究を実行して、表現学習におけるそれらの制限を理解します。したがって、我々は、最先端のメタ学習ベースの方法を驚くほど上回っている、時間的整合のない単純な分類器ベースのベースラインを提案します。次に、新規アクションクラスとImageNetオブジェクトクラスの間に高い相関関係があることを発見しました。これは、数ショットの認識設定で問題があります。私たちの結果は、ゼロからのトレーニングのパフォーマンスが大幅に低下することを示しています。これは、既存のベンチマークでは十分な基本データを提供できないことを意味します。最後に、事前トレーニングなしで将来の数ショットのビデオ分類を容易にするために、より多くの基本データを含む新しいベンチマークを提示します。コードはhttps://github.com/MCG-NJU/FSL-Videoで入手できます。

The existing few-shot video classification methods often employ a meta-learning paradigm by designing customized temporal alignment module for similarity calculation. While significant progress has been made, these methods fail to focus on learning effective representations, and heavily rely on the ImageNet pre-training, which might be unreasonable for the few-shot recognition setting due to semantics overlap. In this paper, we aim to present an in-depth study on few-shot video classification by making three contributions. First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning. Accordingly, we propose a simple classifier-based baseline without any temporal alignment that surprisingly outperforms the state-of-the-art meta-learning based methods. Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting. Our results show that the performance of training from scratch drops significantly, which implies that the existing benchmarks cannot provide enough base data. Finally, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training. The code will be made available at https://github.com/MCG-NJU/FSL-Video.

updated: Sun Oct 24 2021 06:01:46 GMT+0000 (UTC)

published: Sun Oct 24 2021 06:01:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト