Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction

Wentao He; Jianfeng Ren; Ruibin Bai; Xudong Jiang

ビデオ予測への応用を伴うRPMでの2段階のルール誘導視覚的推論

RavenのProgressiveMatrices（RPM）は、人間の視覚的推論能力の評価に頻繁に使用されます。研究者は、RPM問題を自動的に解決するシステムの開発に多大な努力を払ってきました。多くの場合、視覚認識と論理的推論の両方のタスクのためのブラックボックスエンドツーエンド畳み込みニューラルネットワークを介して行われます。 RPM問題の2つの本質的な性質、視覚認識と論理的推論に基づいて、知覚モジュールと推論モジュールで構成される2段階のルール誘導視覚推論（TRIVR）を提案し、現実世界の課題に取り組みます。それぞれ、視覚認識とそれに続く論理的推論タスク。推論モジュールについては、RPMを解く際の人間の思考をモデル化し、モデルの複雑さを大幅に軽減する「2 +1」定式化をさらに提案します。これは、各RPMサンプルから推論ルールを導き出しますが、これは既存の方法では実行できません。その結果、提案された推論モジュールは、RPM問題を解決する際に人間をモデル化する一連の推論ルールを生成することができます。提案された方法を実際のアプリケーションで検証するために、RPMのようなビデオ予測（RVP）データセットが構築され、実際のビデオフレームを使用して構築されたRPMで視覚的な推論が実行されます。さまざまなRPMのようなデータセットでの実験結果は、提案されたTRIVRが、最先端のモデルと比較して、大幅で一貫したパフォーマンスの向上を達成することを示しています。

Raven's Progressive Matrices (RPMs) are frequently used in evaluating human's visual reasoning ability. Researchers have made considerable efforts in developing systems to automatically solve the RPM problem, often through a black-box end-to-end convolutional neural network for both visual recognition and logical reasoning tasks. Based on the two intrinsic natures of RPM problem, visual recognition and logical reasoning, we propose a Two-stage Rule-Induction Visual Reasoner (TRIVR), which consists of a perception module and a reasoning module, to tackle the challenges of real-world visual recognition and subsequent logical reasoning tasks, respectively. For the reasoning module, we further propose a "2+1" formulation that models human's thinking in solving RPMs and significantly reduces the model complexity. It derives a reasoning rule from each RPM sample, which is not feasible for existing methods. As a result, the proposed reasoning module is capable of yielding a set of reasoning rules modeling human in solving the RPM problems. To validate the proposed method on real-world applications, an RPM-like Video Prediction (RVP) dataset is constructed, where visual reasoning is conducted on RPMs constructed using real-world video frames. Experimental results on various RPM-like datasets demonstrate that the proposed TRIVR achieves a significant and consistent performance gain compared with the state-of-the-art models.

updated: Wed Jan 05 2022 04:40:43 GMT+0000 (UTC)

published: Wed Nov 24 2021 06:51:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト