Supplementing Missing Visions via Dialog for Scene Graph Generations

Zhenghao Zhao; Ye Zhu; Xiaoguang Zhu; Yuzhang Shang; Yan Yan

シーングラフ生成のためのダイアログを介して欠落しているビジョンを補足する

現在のほとんどのAIシステムは、入力された視覚データがさまざまなコンピュータービジョンタスクで競争力のあるパフォーマンスを達成するのに十分であるという前提に依存しています。ただし、従来のタスクセットアップでは、さまざまな理由（たとえば、視野範囲の制限やオクルージョン）が原因で完全なビジュアルデータにアクセスできない可能性がある、困難でありながら一般的な実際の状況を考慮することはめったにありません。この目的のために、不完全な視覚入力データを使用したコンピュータービジョンタスク設定を調査します。具体的には、さまざまなレベルの視覚データの欠落を入力として使用して、シーングラフ生成（SGG）タスクを活用します。不十分な視覚入力は直感的にパフォーマンスの低下につながりますが、タスクの目的をよりよく達成するために、自然言語ダイアログの相互作用を介して欠落しているビジョンを補足することを提案します。モデルにとらわれないSupplementaryInteractiveDialog（SI-Dial）フレームワークを設計します。これは、ほとんどの既存のモデルと共同で学習でき、現在のAIシステムに自然言語での質問と回答の対話機能を提供します。複数のベースラインにわたって有望なパフォーマンスの向上を達成することにより、視覚入力が欠落しているこのようなタスク設定の実現可能性と、補足情報ソースとしての提案されたダイアログモジュールの有効性を広範な実験と分析を通じて示します。

Most current AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various computer vision tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a computer vision task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input. While insufficient visual input intuitively leads to performance drop, we propose to supplement the missing visions via the natural language dialog interactions to better accomplish the task objective. We design a model-agnostic Supplementary Interactive Dialog (SI-Dial) framework that can be jointly learned with most existing models, endowing the current AI systems with the ability of question-answer interactions in natural language. We demonstrate the feasibility of such a task setting with missing visual input and the effectiveness of our proposed dialog module as the supplementary information source through extensive experiments and analysis, by achieving promising performance improvement over multiple baselines.

updated: Mon Apr 01 2024 16:37:00 GMT+0000 (UTC)

published: Sat Apr 23 2022 21:46:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト