Multimodal Feature Fusion for Video Advertisements Tagging Via Stacking Ensemble

Qingsong Zhou; Hai Liang; Zhimin Lin; Kele Xu

スタッキングアンサンブルを介したタグ付けビデオ広告のマルチモーダル機能融合

ビデオ広告の自動タグ付けは、重大でありながら困難な問題であり、そのアプリケーションが多くの分野で明白であるように思われるため、昨年、関心が高まっています。持続可能な努力がなされてきたにもかかわらず、タグ付けタスクは、効率的な機能融合アプローチが望ましいが、以前の研究では十分に検討されていないなど、まだいくつかの課題に苦しんでいます。この論文では、2021年のTencent AdvertisingAlgorithmCompetitionにおけるマルチモーダルビデオ広告のタグ付けに対するアプローチを紹介します。具体的には、複数のモダリティからの補完的な情報を組み合わせる目的で、新しいマルチモーダル機能融合フレームワークを提案します。このフレームワークは、スタッキングベースのアンサンブルアプローチを導入して、さまざまなレベルのノイズや異なるモダリティ間の競合の影響を軽減します。したがって、私たちのフレームワークは、以前の方法と比較して、タグ付けタスクのパフォーマンスを向上させることができます。提案されたフレームワークの有効性と堅牢性を経験的に調査するために、チャレンジデータセットに対して広範な実験を行います。得られた結果は、私たちのフレームワークが関連するアプローチを大幅に上回り、私たちの方法が最終的なリーダーボードで1位にランクされ、グローバル平均精度（GAP）が82.63％であることを示しています。この分野の研究をより促進するために、最終バージョンでコードをリリースします。

Automated tagging of video advertisements has been a critical yet challenging problem, and it has drawn increasing interests in last years as its applications seem to be evident in many fields. Despite sustainable efforts have been made, the tagging task is still suffered from several challenges, such as, efficiently feature fusion approach is desirable, but under-explored in previous studies. In this paper, we present our approach for Multimodal Video Ads Tagging in the 2021 Tencent Advertising Algorithm Competition. Specifically, we propose a novel multi-modal feature fusion framework, with the goal to combine complementary information from multiple modalities. This framework introduces stacking-based ensembling approach to reduce the influence of varying levels of noise and conflicts between different modalities. Thus, our framework can boost the performance of the tagging task, compared to previous methods. To empirically investigate the effectiveness and robustness of the proposed framework, we conduct extensive experiments on the challenge datasets. The obtained results suggest that our framework can significantly outperform related approaches and our method ranks as the 1st place on the final leaderboard, with a Global Average Precision (GAP) of 82.63%. To better promote the research in this field, we will release our code in the final version.

updated: Mon Aug 02 2021 07:26:28 GMT+0000 (UTC)

published: Mon Aug 02 2021 07:26:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト