t-EVA: Time-Efficient t-SNE Video Annotation

Soroosh Poorgholi; Osman Semih Kayhan; Jan C. van Gemert

t-EVA：時間効率の良いt-SNEビデオ注釈

いくつかの大規模なビデオデータセットが利用可能であるため、ビデオの理解は過去数年間でより注目を集めています。ただし、大規模なビデオデータセットに注釈を付けるにはコストがかかります。この作業では、時空間特徴の類似性とt-SNE次元削減を使用して、注釈プロセスを大幅に高速化する、時間効率の高いビデオ注釈方法を提案します。特徴の類似性に基づいて、異なるビデオからの同じアクションを2次元空間内で互いに近くに配置すると、アノテーターがビデオクリップにグループラベルを付けるのに役立ちます。 ActivityNet（v1.3）の2つのサブセットとSports-1Mデータセットのサブセットでメソッドを評価します。ビデオ分類のテスト精度を維持しながら、t-EVAが他のビデオ注釈ツールよりも優れていることを示します。

Video understanding has received more attention in the past few years due to the availability of several large-scale video datasets. However, annotating large-scale video datasets are cost-intensive. In this work, we propose a time-efficient video annotation method using spatio-temporal feature similarity and t-SNE dimensionality reduction to speed up the annotation process massively. Placing the same actions from different videos near each other in the two-dimensional space based on feature similarity helps the annotator to group-label video clips. We evaluate our method on two subsets of the ActivityNet (v1.3) and a subset of the Sports-1M dataset. We show that t-EVA can outperform other video annotation tools while maintaining test accuracy on video classification.

updated: Thu Nov 26 2020 09:56:54 GMT+0000 (UTC)

published: Thu Nov 26 2020 09:56:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト