Reliable Shot Identification for Complex Event Detection via Visual-Semantic Embedding

Minnan Luo; Xiaojun Chang; Chen Gong

視覚的意味埋め込みによる複雑なイベント検出のための信頼性の高いショット識別

マルチメディアイベント検出は、Webサイトでユーザーが生成したビデオで関心のある特定のイベントを検出するタスクです。このタスクが直面する最も基本的な課題は、ビデオの品質が大きく異なることと、本質的にイベントの高レベルのセマンティック抽象化にあります。この論文では、ビデオをいくつかのセグメントに分解し、各ビデオを各セグメントがインスタンスと呼ばれるセグメントの「バッグ」として表すことにより、複雑なイベント検出のタスクを複数インスタンスの学習問題として直感的にモデル化します。インスタンスを平等に扱う代わりに、各インスタンスを信頼性変数に関連付けてその重要性を示し、トレーニング用に信頼できるインスタンスを選択します。さまざまなインスタンスの信頼性を正確に測定するために、インスタンスイベントの類似性に基づく高レベルのセマンティック機能とともに視覚情報からの低レベルの機能を活用することにより、視覚セマンティックガイド損失を提案します。カリキュラム学習を動機として、負のエラスティックネット正則化項を導入して、信頼性の高いインスタンスで分類器のトレーニングを開始し、信頼性の低いインスタンスを徐々に考慮します。提案された挑戦的な非凸非平滑問題を解決するために、代替の最適化アルゴリズムが開発されています。標準データセット、つまりTRECVID MEDTest2013およびTRECVIDMEDTest 2014での実験結果は、ベースラインアルゴリズムに対する提案された方法の有効性と優位性を示しています。

Multimedia event detection is the task of detecting a specific event of interest in an user-generated video on websites. The most fundamental challenge facing this task lies in the enormously varying quality of the video as well as the high-level semantic abstraction of event inherently. In this paper, we decompose the video into several segments and intuitively model the task of complex event detection as a multiple instance learning problem by representing each video as a "bag" of segments in which each segment is referred to as an instance. Instead of treating the instances equally, we associate each instance with a reliability variable to indicate its importance and then select reliable instances for training. To measure the reliability of the varying instances precisely, we propose a visual-semantic guided loss by exploiting low-level feature from visual information together with instance-event similarity based high-level semantic feature. Motivated by curriculum learning, we introduce a negative elastic-net regularization term to start training the classifier with instances of high reliability and gradually taking the instances with relatively low reliability into consideration. An alternative optimization algorithm is developed to solve the proposed challenging non-convex non-smooth problem. Experimental results on standard datasets, i.e., TRECVID MEDTest 2013 and TRECVID MEDTest 2014, demonstrate the effectiveness and superiority of the proposed method to the baseline algorithms.

updated: Tue Oct 12 2021 11:46:56 GMT+0000 (UTC)

published: Tue Oct 12 2021 11:46:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト