The MSR-Video to Text Dataset with Clean Annotations

Haoran Chen; Jianmin Li; Simone Frintrop; Xiaolin Hu

クリーンな注釈付きのMSR-ビデオからテキストへのデータセット

ビデオキャプションは、ビデオコンテンツの簡単な説明を、通常は1つの文の形式で自動的に生成します。この課題を解決するために多くの方法が提案されてきた。 MSR Video to Text（MSR-VTT）と呼ばれる大規模なデータセットは、メソッドのパフォーマンスをテストするためのベンチマークデータセットとしてよく使用されます。ただし、人間の注釈、つまりデータセット内のビデオコンテンツの説明は非常にノイズが多いことがわかりました。たとえば、重複するキャプションが多数あり、多くのキャプションに文法上の問題が含まれています。これらの問題は、基礎となるパターンを学習するためのビデオキャプションモデルに問題を引き起こす可能性があります。これらの問題を取り除くことでMSR-VTTアノテーションをクリーンアップし、クリーンアップされたデータセットでいくつかの典型的なビデオキャプションモデルをテストしました。実験結果は、データクリーニングが、一般的な定量的メトリックによって測定されたモデルのパフォーマンスを向上させることを示しました。元のデータセットとクリーンなデータセットでトレーニングされたモデルの結果を評価するために、被験者を募集しました。人間の行動実験は、クリーンアップされたデータセットでトレーニングされたモデルが、ビデオクリップのコンテンツにより一貫性があり関連性の高いキャプションを生成することを示しました。

Video captioning automatically generates short descriptions of the video content, usually in form of a single sentence. Many methods have been proposed for solving this task. A large dataset called MSR Video to Text (MSR-VTT) is often used as the benchmark dataset for testing the performance of the methods. However, we found that the human annotations, i.e., the descriptions of video contents in the dataset are quite noisy, e.g., there are many duplicate captions and many captions contain grammatical problems. These problems may pose difficulties to video captioning models for learning underlying patterns. We cleaned the MSR-VTT annotations by removing these problems, then tested several typical video captioning models on the cleaned dataset. Experimental results showed that data cleaning boosted the performances of the models measured by popular quantitative metrics. We recruited subjects to evaluate the results of a model trained on the original and cleaned datasets. The human behavior experiment demonstrated that trained on the cleaned dataset, the model generated captions that were more coherent and more relevant to the contents of the video clips.

updated: Sun Feb 25 2024 09:04:32 GMT+0000 (UTC)

published: Fri Feb 12 2021 11:14:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト