OpenMask3D: Open-Vocabulary 3D Instance Segmentation

Ayça Takmaz; Elisabetta Fedele; Robert W. Sumner; Marc Pollefeys; Federico Tombari; Francis Engelmann

OpenMask3D: オープンボキャブラリー 3D インスタンスのセグメンテーション

オープンボキャブラリーの 3D インスタンスセグメンテーションのタスクを紹介します。 3D インスタンスセグメンテーションの従来のアプローチは主に、オブジェクトカテゴリの閉じたセットに制限されている既存のアノテーション付きデータセットに依存しています。これは、さまざまなオブジェクトに関連する斬新でオープンな語彙クエリに基づいてタスクを実行する必要がある実際のアプリケーションにとって重要な制限です。最近、シーン内の各ポイントごとにクエリ可能な特徴を学習することでこの問題に対処する、オープンボキャブラリーの 3D シーン理解手法が登場しました。このような表現を直接使用してセマンティックセグメンテーションを実行することはできますが、既存の方法ではオブジェクトインスタンスを識別する能力に限界があります。この研究では、この制限に対処し、オープン語彙 3D インスタンスセグメンテーションのゼロショットアプローチである OpenMask3D を提案します。予測されたクラスに依存しない 3D インスタンスマスクに基づいて、私たちのモデルは、CLIP ベースの画像埋め込みのマルチビュー融合を介してマスクごとの特徴を集約します。 OpenMask3D のパフォーマンスを評価するために、ScanNet200 データセットで実験とアブレーション研究を実施し、オープンボキャブラリー 3D インスタンスセグメンテーションタスクに関する洞察を提供します。私たちのアプローチが、特にロングテール分布において、他のオープン語彙の対応物よりも優れていることを示します。さらに、OpenMask3D は、語彙に近いアプローチの制限を超え、セマンティクス、ジオメトリ、アフォーダンス、マテリアルプロパティなどのオブジェクトプロパティを記述する自由形式のクエリに基づいてオブジェクトインスタンスをセグメント化することができます。

We introduce the task of open-vocabulary 3D instance segmentation. Traditional approaches for 3D instance segmentation largely rely on existing 3D annotated datasets, which are restricted to a closed-set of object categories. This is an important limitation for real-life applications where one might need to perform tasks guided by novel, open-vocabulary queries related to objects from a wide variety. Recently, open-vocabulary 3D scene understanding methods have emerged to address this problem by learning queryable features per each point in the scene. While such a representation can be directly employed to perform semantic segmentation, existing methods have limitations in their ability to identify object instances. In this work, we address this limitation, and propose OpenMask3D, which is a zero-shot approach for open-vocabulary 3D instance segmentation. Guided by predicted class-agnostic 3D instance masks, our model aggregates per-mask features via multi-view fusion of CLIP-based image embeddings. We conduct experiments and ablation studies on the ScanNet200 dataset to evaluate the performance of OpenMask3D, and provide insights about the open-vocabulary 3D instance segmentation task. We show that our approach outperforms other open-vocabulary counterparts, particularly on the long-tail distribution. Furthermore, OpenMask3D goes beyond the limitations of close-vocabulary approaches, and enables the segmentation of object instances based on free-form queries describing object properties such as semantics, geometry, affordances, and material properties.

updated: Fri Jun 23 2023 17:36:44 GMT+0000 (UTC)

published: Fri Jun 23 2023 17:36:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト