A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

Ahmed Elhagry; Karima Kadaoui

画像キャプションのための最近の深層学習方法論に関する徹底的なレビュー

画像キャプションは、コンピュータビジョンと自然言語処理を組み合わせたタスクであり、画像の説明的な凡例を生成することを目的としています。これは、構文的にも意味的にも、正確な画像理解と正しい言語理解に依存する2つのプロセスです。このトピックに関して利用できる知識の量が増えているため、画像キャプションの分野における最新の研究や発見に追いつくことがますます困難になっています。ただし、入手可能なレビューペーパーでは、これらの調査結果を十分にカバーしていません。このホワイトペーパーでは、画像のキャプションで使用されている現在の手法、データセット、ベンチマーク、および評価指標の概要を説明します。この分野の現在の研究は、主に深層学習ベースの方法に焦点を当てており、注意メカニズムと深層強化および敵対的学習がこの研究トピックの最前線にあるように見えます。このホワイトペーパーでは、UpDown、OSCAR、VIVO、メタ学習などの最近の方法論と、条件付き生成的敵対的ネットを使用するモデルについて説明します。 GANベースのモデルは最高のスコアを達成しますが、UpDownは画像のキャプションの重要な基礎を表し、OSCARとVIVOは新しいオブジェクトのキャプションを使用するためより便利です。このレビューペーパーは、研究者が画像キャプション生成の分野で行われた最新の貢献について最新の状態に保つためのロードマップとして役立ちます。

Image Captioning is a task that combines computer vision and natural language processing, where it aims to generate descriptive legends for images. It is a two-fold process relying on accurate image understanding and correct language understanding both syntactically and semantically. It is becoming increasingly difficult to keep up with the latest research and findings in the field of image captioning due to the growing amount of knowledge available on the topic. There is not, however, enough coverage of those findings in the available review papers. We perform in this paper a run-through of the current techniques, datasets, benchmarks and evaluation metrics used in image captioning. The current research on the field is mostly focused on deep learning-based methods, where attention mechanisms along with deep reinforcement and adversarial learning appear to be in the forefront of this research topic. In this paper, we review recent methodologies such as UpDown, OSCAR, VIVO, Meta Learning and a model that uses conditional generative adversarial nets. Although the GAN-based model achieves the highest score, UpDown represents an important basis for image captioning and OSCAR and VIVO are more useful as they use novel object captioning. This review paper serves as a roadmap for researchers to keep up to date with the latest contributions made in the field of image caption generation.

updated: Wed Jul 28 2021 00:54:59 GMT+0000 (UTC)

published: Wed Jul 28 2021 00:54:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト