ScanQA: 3D Question Answering for Spatial Scene Understanding

Daichi Azuma; Taiki Miyanishi; Shuhei Kurita; Motoaki Kawanabe

ScanQA：空間シーンを理解するための3D質問応答

3D質問応答（3D-QA）の新しい3D空間理解タスクを提案します。 3D-QAタスクでは、モデルはリッチRGB-D屋内スキャンの3Dシーン全体から視覚情報を受け取り、3Dシーンに関する特定のテキストの質問に答えます。 VQAの2D質問応答とは異なり、従来の2D-QAモデルは、オブジェクトの配置と方向の空間的理解に問題があり、3D-QAのテキストの質問からのオブジェクトの識別に失敗します。 ScanQAモデルという名前の3D-QAのベースラインモデルを提案します。このモデルでは、3Dオブジェクトの提案とエンコードされた文の埋め込みから融合記述子を学習します。この学習された記述子は、言語表現を3Dスキャンの基礎となる幾何学的特徴と相関させ、3Dバウンディングボックスの回帰を容易にして、テキストの質問で記述されたオブジェクトを決定し、正解を出力します。各3Dシーンの3Dオブジェクトに基づいた自由形式の回答を含む、人間が編集した質問と回答のペアを収集しました。新しいScanQAデータセットには、ScanNetデータセットから抽出された800の屋内シーンからの40Kを超える質問と回答のペアが含まれています。私たちの知る限り、提案された3D-QAタスクは、3D環境でオブジェクトに基づいた質問応答を実行する最初の大規模な取り組みです。

We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the 3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D indoor scan and answer the given textual questions about the 3D scene. Unlike the 2D-question answering of VQA, the conventional 2D-QA models suffer from problems with spatial understanding of object alignment and directions and fail the object identification from the textual questions in 3D-QA. We propose a baseline model for 3D-QA, named ScanQA model, where the model learns a fused descriptor from 3D object proposals and encoded sentence embeddings. This learned descriptor correlates the language expressions with the underlying geometric features of the 3D scan and facilitates the regression of 3D bounding boxes to determine described objects in textual questions and outputs correct answers. We collected human-edited question-answer pairs with free-form answers that are grounded to 3D objects in each 3D scene. Our new ScanQA dataset contains over 40K question-answer pairs from the 800 indoor scenes drawn from the ScanNet dataset. To the best of our knowledge, the proposed 3D-QA task is the first large-scale effort to perform object-grounded question-answering in 3D environments.

updated: Sat May 07 2022 21:55:42 GMT+0000 (UTC)

published: Mon Dec 20 2021 12:30:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト