3D Question Answering

Shuquan Ye; Dongdong Chen; Songfang Han; Jing Liao

3D 質問応答

視覚的質問応答 (VQA) は、近年大きな進歩を遂げています。ただし、ほとんどの取り組みは、2D 画像の質問応答タスクにのみ焦点を当てています。このホワイトペーパーでは、VQA を 3D ドメインに拡張する最初の試みを紹介します。これにより、人工知能による 3D 現実世界のシナリオの認識が容易になります。画像ベースの VQA とは異なり、3D 質問応答 (3DQA) はカラーポイントクラウドを入力として受け取り、3D 関連の質問に答えるには、外観と 3D ジオメトリの理解能力の両方が必要です。この目的のために、新しいトランスフォーマーベースの 3DQA フレームワーク「3DQA-TR」を提案します。これは、外観情報とジオメトリ情報をそれぞれ活用するための 2 つのエンコーダーで構成されます。外観、形状、および言語に関する質問のマルチモーダル情報は、最終的に 3D-Linguistic Bert を介して相互に対応し、ターゲットの回答を予測できます。提案した 3DQA フレームワークの有効性を検証するために、最初の 3DQA データセット「ScanQA」をさらに開発します。これは、ScanNet データセットに基づいて構築され、806 シーンに対して約 6K の質問、約 30K の回答が含まれています。このデータセットに関する広範な実験により、既存の VQA フレームワークに対する提案された 3DQA フレームワークの明らかな優位性と、主要な設計の有効性が実証されました。私たちのコードとデータセットは、この方向の研究を促進するために公開されます。

Visual Question Answering (VQA) has witnessed tremendous progress in recent years. However, most efforts only focus on the 2D image question answering tasks. In this paper, we present the first attempt at extending VQA to the 3D domain, which can facilitate artificial intelligence's perception of 3D real-world scenarios. Different from image based VQA, 3D Question Answering (3DQA) takes the color point cloud as input and requires both appearance and 3D geometry comprehension ability to answer the 3D-related questions. To this end, we propose a novel transformer-based 3DQA framework "3DQA-TR", which consists of two encoders for exploiting the appearance and geometry information, respectively. The multi-modal information of appearance, geometry, and the linguistic question can finally attend to each other via a 3D-Linguistic Bert to predict the target answers. To verify the effectiveness of our proposed 3DQA framework, we further develop the first 3DQA dataset "ScanQA", which builds on the ScanNet dataset and contains ∼6K questions, ∼30K answers for 806 scenes. Extensive experiments on this dataset demonstrate the obvious superiority of our proposed 3DQA framework over existing VQA frameworks, and the effectiveness of our major designs. Our code and dataset will be made publicly available to facilitate the research in this direction.

updated: Tue Nov 29 2022 01:51:28 GMT+0000 (UTC)

published: Wed Dec 15 2021 18:59:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト