#PraCegoVer: A Large Dataset for Image Captioning in Portuguese

Gabriel Oliveira dos Santos; Esther Luna Colombini; Sandra Avila

#PraCegoVer：ポルトガル語の画像キャプション用の大規模なデータセット

自然な文章を使用して画像を自動的に説明することは、視覚障害者のインターネットへの参加をサポートするための重要なタスクです。画像に存在するオブジェクトの関係と、それらが関与する属性およびアクションを理解する必要があることは、依然として大きな課題です。次に、視覚的な解釈方法が必要ですが、意味関係を口頭で説明するための言語モデルも必要です。この問題は、画像キャプションとして知られています。多くのデータセットが文献で提案されていますが、大部分は英語のキャプションのみを含んでいますが、他の言語で記述されたキャプションを含むデータセットはほとんどありません。最近、PraCegoVerと呼ばれる動きがインターネット上で発生し、ソーシャルメディアのユーザーが画像を公開し、＃PraCegoVerにタグを付け、コンテンツの簡単な説明を追加するように刺激しました。したがって、この動きに触発されて、Instagramからの投稿に基づいたポルトガル語のキャプションを持つマルチモーダルデータセットである#PraCegoVerを提案しました。これは、自由に注釈が付けられた画像を使用したポルトガル語の画像キャプション用の最初の大規模なデータセットです。さらに、データセットのキャプションは問題に追加の課題をもたらします。まず、MS COCOキャプションなどの一般的なデータセットとは対照的に、＃PraCegoVerには各画像への参照が1つしかありません。また、参照文の長さの平均と分散の両方が、MSCOCOキャプションのものよりも大幅に大きくなっています。これらの2つの特性は、言語的側面と画像キャプションの問題につながる課題のために、データセットを面白くするのに役立ちます。 https://github.com/gabrielsantosrv/PraCegoVerでデータセットを公開しています。

Automatically describing images using natural sentences is an important task to support visually impaired people's inclusion onto the Internet. It is still a big challenge that requires understanding the relation of the objects present in the image and their attributes and actions they are involved in. Then, visual interpretation methods are needed, but linguistic models are also necessary to verbally describe the semantic relations. This problem is known as Image Captioning. Although many datasets were proposed in the literature, the majority contains only English captions, whereas datasets with captions described in other languages are scarce. Recently, a movement called PraCegoVer arose on the Internet, stimulating users from social media to publish images, tag #PraCegoVer and add a short description of their content. Thus, inspired by this movement, we have proposed the #PraCegoVer, a multi-modal dataset with Portuguese captions based on posts from Instagram. It is the first large dataset for image captioning in Portuguese with freely annotated images. Further, the captions in our dataset bring additional challenges to the problem: first, in contrast to popular datasets such as MS COCO Captions, #PraCegoVer has only one reference to each image; also, both mean and variance of our reference sentence length are significantly greater than those in the MS COCO Captions. These two characteristics contribute to making our dataset interesting due to the linguistic aspect and the challenges that it introduces to the image captioning problem. We publicly-share the dataset at https://github.com/gabrielsantosrv/PraCegoVer.

updated: Sun Mar 21 2021 19:55:46 GMT+0000 (UTC)

published: Sun Mar 21 2021 19:55:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト