Learning to Localize Temporal Events in Large-scale Video Data

Mikel Bober-Irizar; Miha Skalic; David Austin

大規模なビデオデータの一時的なイベントをローカライズする学習

Youtube-8M Segmentsデータセットのコンテキストで、大規模なビデオデータのイベントの一時的なローカライズに対処します。ビデオ認識におけるこの新たな分野により、アプリケーションは、特定のイベントがビデオで発生する正確な時間を特定できるようになります。これは、ビデオ検索に広い意味を持ちます。これに対処するために、2つの個別のアプローチを示します：（1）細工されたデータセットの勾配ブースト決定ツリーモデル、および（2）フレームレベルデータ、ビデオレベルデータ、およびローカリゼーションモデルに基づく深層学習モデルの組み合わせ。これら2つのアプローチの組み合わせは、3番目のYoutube-8Mビデオ認識チャレンジで5位になりました。

We address temporal localization of events in large-scale video data, in the context of the Youtube-8M Segments dataset. This emerging field within video recognition can enable applications to identify the precise time a specified event occurs in a video, which has broad implications for video search. To address this we present two separate approaches: (1) a gradient boosted decision tree model on a crafted dataset and (2) a combination of deep learning models based on frame-level data, video-level data, and a localization model. The combinations of these two approaches achieved 5th place in the 3rd Youtube-8M video recognition challenge.

updated: Fri Oct 25 2019 11:40:29 GMT+0000 (UTC)

published: Fri Oct 25 2019 11:40:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト