Alleviating Noisy Data in Image Captioning with Cooperative Distillation

Pierre Dognin; Igor Melnyk; Youssef Mroueh; Inkit Padhi; Mattia Rigotti; Jarret Ross; Yair Schiff

協調蒸留による画像キャプションのノイズの多いデータの軽減

画像キャプションシステムは、主に、対応する画像の正確な説明を持つMicrosoft COCOやVizwizなどの厳選されたデータセットが利用できるようになったため、大幅に進歩しました。残念ながら、そのようなきれいにラベル付けされたデータの可用性が不足しているため、トレーニングされたアルゴリズムがキャプションを生成します。キャプションは、画像の詳細に簡潔で特異的に固有のものになる可能性があります。クリーンなキュレートされたデータセットと、画像の説明が不十分な可能性があるがサイズが豊富で豊富な語彙を提供するGoogle Conceptual Captionsデータセット（GCC）のWebスケールの自動抽出キャプションを組み合わせた新しい手法である協調蒸留を提案します。より表現力豊かなキャプションになります。

Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose a new technique, cooperative distillation that combines clean curated datasets with the web-scale automatically extracted captions of the Google Conceptual Captions dataset (GCC), which can have poor descriptions of images, but is abundant in size and therefore provides a rich vocabulary resulting in more expressive captions.

updated: Mon Dec 21 2020 21:32:28 GMT+0000 (UTC)

published: Mon Dec 21 2020 21:32:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト