Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Aditya Mogadala; Marimuthu Kalimuthu; Dietrich Klakow

ビジョンと言語研究の統合の傾向：タスク、データセット、方法の調査

人工知能（AI）とそのアプリケーションへの関心は、過去数年間で前例のない成長を見せています。この成功は、機械学習（ML）、コンピュータービジョン（CV）、自然言語処理（NLP）などのAIのサブフィールドで行われた進歩に一部起因する可能性があります。これらの分野の最大の成長は、人工ニューラルネットワークの原理を使用する機械学習のサブエリアであるディープラーニングによって可能になりました。これにより、ビジョンと言語の統合に大きな関心が寄せられました。タスクは、ディープラーニングのアイデアを完全に受け入れるように設計されています。この調査では、問題の定式化、方法、既存のデータセット、評価尺度について議論することにより、言語とビジョンを統合する10の主要なタスクに焦点を当て、対応する最新の方法で得られた結果を比較します。私たちの取り組みは、タスク固有であるか、1種類のビジュアルコンテンツ（画像やビデオなど）のみに集中する以前の調査を超えています。さらに、この調査が既存の課題に対処し、新しいアプリケーションを構築するための革新的な思考やアイデアをもたらすことを期待して、この研究分野の潜在的な将来の方向性も提供します。

The interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). The largest of the growths in these fields has been made possible with deep learning, a sub-area of machine learning, which uses the principles of artificial neural networks. This has created significant interest in the integration of vision and language. The tasks are designed such that they perfectly embrace the ideas of deep learning. In this survey, we focus on ten prominent tasks that integrate language and vision by discussing their problem formulations, methods, existing datasets, evaluation measures, and compare the results obtained with corresponding state-of-the-art methods. Our efforts go beyond earlier surveys which are either task-specific or concentrate only on one type of visual content, i.e., image or video. Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey brings in innovative thoughts and ideas to address the existing challenges and build new applications.

updated: Sat Sep 12 2020 13:26:29 GMT+0000 (UTC)

published: Mon Jul 22 2019 14:53:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト