Boosting Video Captioning with Dynamic Loss Network

Nasib Ullah; Partha Pratim Mohanta

ダイナミックロスネットワークによるビデオキャプションの強化

ビデオキャプションは、視覚と言語の交差点での困難な問題の1つであり、ビデオ検索、ビデオ監視、視覚障害者の支援、ヒューマンマシンインターフェイスなど、多くの実際のアプリケーションがあります。最近の深層学習ベースの方法は有望な結果を示していますが、他の視覚タスク（画像分類、オブジェクト検出など）よりもまだ低い側にあります。既存のビデオキャプション方法の重大な欠点は、クロスエントロピー損失関数に対して最適化されていることです。これは、事実上の評価メトリック（BLEU、METEOR、CIDER、ROUGE）とは無相関です。言い換えれば、クロスエントロピーは、ビデオキャプションの真の損失関数の適切な代理ではありません。これを軽減するために、REINFORCE、Actor-Critic、Minimum Risk Training（MRT）などの方法が適用されていますが、制限があり、あまり効果的ではありません。このホワイトペーパーでは、動的損失ネットワーク（DLN）を導入し、評価メトリックを直接反映する追加のフィードバック信号を提供することにより、代替ソリューションを提案します。私たちのソリューションは、他のソリューションよりも効率的であり、同様のタスクに簡単に適応できることが証明されています。 Microsoft Research Video Description Corpus（MSVD）およびMSR-Video to Text（MSRVTT）データセットに関する結果は、以前の方法よりも優れています。

Video captioning is one of the challenging problems at the intersection of vision and language, having many real-life applications in video retrieval, video surveillance, assisting visually challenged people, Human-machine interface, and many more. Recent deep learning based methods have shown promising results but are still on the lower side than other vision tasks (such as image classification, object detection). A significant drawback with existing video captioning methods is that they are optimized over cross-entropy loss function, which is uncorrelated to the de facto evaluation metrics (BLEU, METEOR, CIDER, ROUGE). In other words, cross-entropy is not a proper surrogate of the true loss function for video captioning. To mitigate this, methods like REINFORCE, Actor-Critic, and Minimum Risk Training (MRT) have been applied but have limitations and are not very effective. This paper proposes an alternate solution by introducing a dynamic loss network (DLN), providing an additional feedback signal that reflects the evaluation metrics directly. Our solution proves to be more efficient than other solutions and can be easily adapted to similar tasks. Our results on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSRVTT) datasets outperform previous methods.

updated: Tue Feb 01 2022 19:17:11 GMT+0000 (UTC)

published: Sun Jul 25 2021 01:32:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト