A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective

Chaoqi Chen; Yushuang Wu; Qiyuan Dai; Hong-Yu Zhou; Mutian Xu; Sibei Yang; Xiaoguang Han; Yizhou Yu

コンピュータービジョンにおけるグラフニューラルネットワークとグラフトランスフォーマーに関する調査: タスク指向の視点

グラフニューラルネットワーク (GNN) は、グラフ表現学習で勢いを増し、データマイニング (ソーシャルネットワーク分析やレコメンダーシステムなど)、コンピュータービジョン (オブジェクト検出やポイントなど) など、さまざまな分野の最先端技術を後押ししています。クラウド学習など）、自然言語処理（関係抽出やシーケンス学習など）などが挙げられます。自然言語処理とコンピュータービジョンにおける Transformer の出現により、グラフ Transformer はグラフ構造を Transformer アーキテクチャに埋め込み、厳密な構造的帰納的バイアスを回避しながら、ローカル近傍集約の制限を克服します。このホワイトペーパーでは、タスク指向の観点から、コンピュータービジョンにおける GNN とグラフトランスフォーマーの包括的なレビューを示します。具体的には、入力データのモダリティに応じて、2D 自然画像、動画、3D データ、視覚 + 言語、医用画像の 5 つのカテゴリにコンピュータビジョンでのアプリケーションを分類します。各カテゴリでは、一連のビジョンタスクに従ってアプリケーションをさらに分類します。このようなタスク指向の分類法により、各タスクがさまざまな GNN ベースのアプローチによってどのように取り組まれ、これらのアプローチがどの程度うまく機能するかを調べることができます。必要な準備に基づいて、タスクの定義と課題、代表的なアプローチの詳細な説明、および洞察、制限、および将来の方向性に関する議論を提供します。

Graph Neural Networks (GNNs) have gained momentum in graph representation learning and boosted the state of the art in a variety of areas, such as data mining (e.g., social network analysis and recommender systems), computer vision (e.g., object detection and point cloud learning), and natural language processing (e.g., relation extraction and sequence learning), to name a few. With the emergence of Transformers in natural language processing and computer vision, graph Transformers embed a graph structure into the Transformer architecture to overcome the limitations of local neighborhood aggregation while avoiding strict structural inductive biases. In this paper, we present a comprehensive review of GNNs and graph Transformers in computer vision from a task-oriented perspective. Specifically, we divide their applications in computer vision into five categories according to the modality of input data, i.e., 2D natural images, videos, 3D data, vision + language, and medical images. In each category, we further divide the applications according to a set of vision tasks. Such a task-oriented taxonomy allows us to examine how each task is tackled by different GNN-based approaches and how well these approaches perform. Based on the necessary preliminaries, we provide the definitions and challenges of the tasks, in-depth coverage of the representative approaches, as well as discussions regarding insights, limitations, and future directions.

updated: Sun Oct 23 2022 09:46:40 GMT+0000 (UTC)

published: Tue Sep 27 2022 08:10:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト