What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

David M. Chan; Austin Myers; Sudheendra Vijayanarasimhan; David A. Ross; Bryan Seybold; John F. Canny

キャプションには何がありますか？データセット固有の言語多様性とその視覚的記述モデルおよびメトリクスへの影響

自動ビデオ記述の分野では大きな進歩がありましたが、自動記述モデルの新しいドメインへの一般化パフォーマンスは、現実の世界でこれらのシステムを使用する上での大きな障壁となっています。ほとんどの視覚的記述方法は、トレーニングデータのパターンをキャプチャして活用し、評価指標の増加につながることが知られていますが、それらのパターンは何ですか？この作業では、いくつかの一般的な視覚的記述データセットを調べ、モデルが利用するが新しいドメインに一般化しないデータセット固有の言語パターンをキャプチャ、分析、および理解します。トークンレベル、サンプルレベル、およびデータセットレベルでは、キャプションの多様性が、一般的で情報量の少ないキャプションの生成の背後にある主要な推進要因であることがわかります。さらに、最先端のモデルが、最新のメトリックで保持されているグラウンドトゥルースのキャプションよりも優れていること、およびこの効果がデータセットの言語多様性のアーティファクトであることを示します。この言語の多様性を理解することは、強力なキャプションモデルを構築するための鍵です。新しいデータの収集において多様性を維持し、現在のモデルとメトリックを使用する場合の限られた多様性の結果に対処するためのいくつかの方法とアプローチをお勧めします。

While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world. Most visual description methods are known to capture and exploit patterns in the training data leading to evaluation metric increases, but what are those patterns? In this work, we examine several popular visual description datasets, and capture, analyze, and understand the dataset-specific linguistic patterns that models exploit but do not generalize to new domains. At the token level, sample level, and dataset level, we find that caption diversity is a major driving factor behind the generation of generic and uninformative captions. We further show that state-of-the-art models even outperform held-out ground truth captions on modern metrics, and that this effect is an artifact of linguistic diversity in datasets. Understanding this linguistic diversity is key to building strong captioning models, we recommend several methods and approaches for maintaining diversity in the collection of new data, and dealing with the consequences of limited diversity when using current models and metrics.

updated: Thu May 12 2022 17:55:08 GMT+0000 (UTC)

published: Thu May 12 2022 17:55:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト