Neuro-Symbolic Visual Dialog

Adnen Abdessaied; Mihai Bâce; Andreas Bulling

ニューロシンボリックビジュアルダイアログ

ニューロシンボリックビジュアルダイアログ (NSVD) を提案します。これは、深層学習とシンボリックプログラム実行を組み合わせてマルチラウンドの視覚的根拠のある推論を行う最初の方法です。 NSVD は、ビジュアルダイアログに固有の 2 つの主要な課題である、既存の純粋にコネクショニストな方法よりも大幅に優れています。長距離の相互参照の解決と、質問応答パフォーマンスの消失です。精度を計算する際に、完全な対話履歴に対して予測された回答を使用する、より現実的で厳密な評価スキームを提案することにより、後者を示します。私たちのモデルの 2 つの変形について説明し、この新しいスキームを使用して、CLEVR-Dialog で 99.72% の精度を達成した最高のモデルを示します。これは、最新技術に対して 10% 以上の相対的な改善であり、必要なトレーニングはほんのわずかです。データ。さらに、ニューロシンボリックモデルは、最初の失敗ラウンドの平均が高く、不完全なダイアログ履歴に対してより堅牢であり、トレーニング中に表示されるダイアログよりも最大 3 倍長いダイアログだけでなく、目に見えない質問に対しても一般化することを示しています。種類とシーン。

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog -a relative improvement of more than 10% over the state of the art while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure round, are more robust against incomplete dialog histories, and generalise better not only to dialogs that are up to three times longer than those seen during training but also to unseen question types and scenes.

updated: Mon Aug 22 2022 14:29:00 GMT+0000 (UTC)

published: Mon Aug 22 2022 14:29:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト