TAEN: Temporal Aware Embedding Network for Few-Shot Action Recognition

Rami Ben-Ari; Mor Shpigel; Ophir Azulai; Udi Barzelay; Daniel Rotman

TAEN：少数のショットアクション認識のための時間認識埋め込みネットワーク

新しいクラスエンティティの分類には、数百または数千のサンプルを収集して注釈を付ける必要があり、多くの場合、法外な費用がかかります。数ショットの学習は、ほんの数例を使用して新しいクラスを分類することを学ぶことを示唆しています。ビデオなどの時空間パターンでの数ショット学習の課題に取り組んでいる研究はごくわずかです。この論文では、距離空間で軌跡としてアクションを表現することを学習し、アクションパーツ間の短期的なセマンティクスと長期的な接続性の両方を伝達する、数ショットのアクション認識のための時間認識埋め込みネットワーク（TAEN）を紹介します。 2つのショットタスク、ビデオ分類と時間的アクション検出でTAENの有効性を示し、Kinetics-400とActivityNet1.2の数ショットベンチマークでメソッドを評価します。いくつかの完全に接続されたレイヤーのトレーニングで、特定のシナリオで最先端に到達しながら、いくつかのショットビデオ分類と時間検出タスクの両方で従来技術と同等の結果に到達します。

Classification of new class entities requires collecting and annotating hundreds or thousands of samples that is often prohibitively costly. Few-shot learning suggests learning to classify new classes using just a few examples. Only a small number of studies address the challenge of few-shot learning on spatio-temporal patterns such as videos. In this paper, we present the Temporal Aware Embedding Network (TAEN) for few-shot action recognition, that learns to represent actions, in a metric space as a trajectory, conveying both short term semantics and longer term connectivity between action parts. We demonstrate the effectiveness of TAEN on two few shot tasks, video classification and temporal action detection and evaluate our method on the Kinetics-400 and on ActivityNet 1.2 few-shot benchmarks. With training of just a few fully connected layers we reach comparable results to prior art on both few shot video classification and temporal detection tasks, while reaching state-of-the-art in certain scenarios.

updated: Sat Jul 17 2021 10:54:57 GMT+0000 (UTC)

published: Tue Apr 21 2020 16:32:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト