VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations

Nikolaos Tsagkas; Oisin Mac Aodha; Chris Xiaoxuan Lu

VL フィールド: 言語に基づいたニューラル暗黙的空間表現に向けて

我々は、オープン語彙の意味論的クエリを可能にするニューラル暗黙的空間表現である Visual-Language Fields (VL-Fields) を紹介します。私たちのモデルは、言語駆動型のセグメンテーションモデルから情報を抽出することによって、シーンのジオメトリを視覚言語で訓練された潜在的な特徴とエンコードして融合します。 VL-Fields は、シーンオブジェクトクラスに関する事前知識を必要とせずにトレーニングされるため、ロボット工学の分野で有望な表現となります。私たちのモデルは、セマンティックセグメンテーションのタスクにおいて、同様の CLIP-Fields モデルよりもほぼ 10% 優れたパフォーマンスを示しました。

We present Visual-Language Fields (VL-Fields), a neural implicit spatial representation that enables open-vocabulary semantic queries. Our model encodes and fuses the geometry of a scene with vision-language trained latent features by distilling information from a language-driven segmentation model. VL-Fields is trained without requiring any prior knowledge of the scene object classes, which makes it a promising representation for the field of robotics. Our model outperformed the similar CLIP-Fields model in the task of semantic segmentation by almost 10%.

updated: Thu May 25 2023 08:38:52 GMT+0000 (UTC)

published: Sun May 21 2023 10:55:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト