Causal Reasoning Meets Visual Representation Learning: A Prospective Study

Yang Liu; Yushen Wei; Hong Yan; Guanbin Li; Liang Lin

因果推論と視覚表現学習の出会い：前向き研究

視覚表現学習は、視覚理解、ビデオ理解、マルチモーダル分析、人間とコンピューターの相互作用、都市コンピューティングなど、さまざまな実世界のアプリケーションに遍在しています。ビッグデータ時代に大量のマルチモーダルな異種空間/時間/時空間データが出現したため、解釈可能性、堅牢性、および分布外の一般化の欠如が既存のビジュアルモデルの課題になりつつあります。既存の方法の大部分は、元のデータ/変数の分布に適合し、マルチモーダル知識の背後にある本質的な因果関係を無視する傾向があります。これは、現代の視覚表現学習方法がデータバイアスに簡単に崩壊し、限られた一般化と認知能力。したがって、人間レベルのエージェントの強力な推論能力に触発されて、近年、優れた認知能力を備えた堅牢な表現とモデル学習を実現するための因果推論パラダイムの開発に多大な努力が払われてきました。この論文では、基本的な理論、モデル、およびデータセットをカバーする、視覚表現学習のための既存の因果推論方法の包括的なレビューを行います。現在のメソッドとデータセットの制限についても説明します。さらに、視覚表現学習における因果推論アルゴリズムのベンチマークのためのいくつかの将来の課題、機会、および将来の研究の方向性を提案します。このペーパーは、この新しい分野の包括的な概要を提供し、注目を集め、議論を促し、新しい因果推論方法、公に利用可能なベンチマーク、および信頼できる視覚表現学習と関連する現実のための合意形成基準を開発する緊急性を最前線にもたらすことを目的としています。より効率的に世界のアプリケーション。

Visual representation learning is ubiquitous in various real-world applications, including visual comprehension, video understanding, multi-modal analysis, human-computer interaction, and urban computing. Due to the emergence of huge amounts of multi-modal heterogeneous spatial/temporal/spatial-temporal data in big data era, the lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. The majority of the existing methods tend to fit the original data/variable distributions and ignore the essential causal relations behind the multi-modal knowledge, which lacks an unified guidance and analysis about why modern visual representation learning methods are easily collapse into data bias and have limited generalization and cognitive abilities. Inspired by the strong inference ability of human-level agents, recent years have therefore witnessed great effort in developing causal reasoning paradigms to realize robust representation and model learning with good cognitive ability. In this paper, we conduct a comprehensive review of existing causal reasoning methods for visual representation learning, covering fundamental theories, models, and datasets. The limitations of current methods and datasets are also discussed. Moreover, we propose some prospective challenges, opportunities, and future research directions for benchmarking causal reasoning algorithms in visual representation learning. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods, publicly available benchmarks, and consensus-building standards for reliable visual representation learning and related real-world applications more efficiently.

updated: Mon May 09 2022 09:19:10 GMT+0000 (UTC)

published: Tue Apr 26 2022 02:22:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト