Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task

Stan Weixian Lei; Difei Gao; Jay Zhangjie Wu; Yuxuan Wang; Wei Liu; Mengmi Zhang; Mike Zheng Shou

シンボリックリプレイ: VQA タスクの継続的学習のプロンプトとしてのシーングラフ

VQA は、画像に関するあらゆる質問に答えることを目的とした野心的なタスクです。しかし、実際には、ユーザーのニーズは常に更新され、新しい機能を実装する必要があるため、このようなシステムを一度に構築することは困難です。したがって、高度な VQA システムの開発には継続的学習 (CL) 能力が不可欠です。最近、このトピックを研究するために、VQA データセットをばらばらな回答セットに分割する先駆的な研究が行われました。ただし、CL on VQA は、ラベルセット (新しいアンサーセット) の拡張だけではありません。 VQA システムを新しい環境 (新しいビジュアルシーン) に展開する際の質問への回答方法と、新しい機能を必要とする質問 (新しい質問タイプ) への回答方法を検討することが重要です。したがって、前述の2つのCLシナリオのシーンおよび機能の増分設定を含む、視覚的な質問に対する継続的な学習のベンチマークであるCLOVEを提案します。方法論に関しては、CL on VQA と分類の主な違いは、前者は推論メカニズムの拡張と忘却の防止を追加で含むのに対し、後者はクラス表現に焦点を当てていることです。したがって、VQA の CL に合わせたリアルデータフリーのリプレイベースの方法を提案します。これは、シンボリックリプレイのプロンプトとしてシーングラフと名付けられました。プロンプトとしてシーングラフの一部を使用して、関連する QA ペアと共に過去の画像を表す疑似シーングラフを再生します。統合された VQA モデルも提案され、現在のデータと再生されたデータを利用してその QA 能力を強化します。最後に、実験結果は CLOVE の課題を明らかにし、私たちの方法の有効性を示しています。データセットとコードは、https://github.com/showlab/CLVQA で入手できます。

VQA is an ambitious task aiming to answer any image-related question. However, in reality, it is hard to build such a system once for all since the needs of users are continuously updated, and the system has to implement new functions. Thus, Continual Learning (CL) ability is a must in developing advanced VQA systems. Recently, a pioneer work split a VQA dataset into disjoint answer sets to study this topic. However, CL on VQA involves not only the expansion of label sets (new Answer sets). It is crucial to study how to answer questions when deploying VQA systems to new environments (new Visual scenes) and how to answer questions requiring new functions (new Question types). Thus, we propose CLOVE, a benchmark for Continual Learning On Visual quEstion answering, which contains scene- and function-incremental settings for the two aforementioned CL scenarios. In terms of methodology, the main difference between CL on VQA and classification is that the former additionally involves expanding and preventing forgetting of reasoning mechanisms, while the latter focusing on class representation. Thus, we propose a real-data-free replay-based method tailored for CL on VQA, named Scene Graph as Prompt for Symbolic Replay. Using a piece of scene graph as a prompt, it replays pseudo scene graphs to represent the past images, along with correlated QA pairs. A unified VQA model is also proposed to utilize the current and replayed data to enhance its QA ability. Finally, experimental results reveal challenges in CLOVE and demonstrate the effectiveness of our method. The dataset and code will be available at https://github.com/showlab/CLVQA.

updated: Mon Aug 29 2022 10:22:20 GMT+0000 (UTC)

published: Wed Aug 24 2022 12:00:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト