Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Youquan Liu; Lingdong Kong; Jun Cen; Runnan Chen; Wenwei Zhang; Liang Pan; Kai Chen; Ziwei Liu

Vision Foundation モデルを抽出して任意の点群シーケンスをセグメント化

視覚基盤モデル (VFM) の最近の進歩により、多用途かつ効率的な視覚認識の新たな可能性が開かれました。この研究では、自動車のさまざまな点群シーケンスをセグメント化するために VFM を利用する新しいフレームワークである Seal を紹介します。 Seal は 3 つの魅力的な特性を示します。 i) スケーラビリティ: VFM は直接点群に抽出されるため、事前トレーニング中に 2D または 3D での注釈が不要になります。 ii) 一貫性: 空間的および時間的な関係は、カメラから LiDAR とポイントからセグメントの両方の段階で強制され、クロスモーダル表現の学習が容易になります。 iii) 一般化可能性: Seal を使用すると、実際/合成、低解像度/高解像度、大規模/小規模、クリーン/破損したデータセットからの点群など、さまざまな点群を含む下流タスクに既製の方法で知識を伝達できます。 11 の異なる点群データセットに対して行われた広範な実験により、Seal の有効性と優位性が実証されました。特に、Seal は線形プローブ後のニューシーンで 45.0% mIoU という驚くべきパフォーマンスを達成し、ランダム初期化を 36.9% mIoU 上回り、従来技術を 6.1% mIoU 上回りました。さらに、Seal は、テストされた 11 個の点群データセットすべてに対する 20 の異なる数ショット微調整タスクにわたって、既存の方法と比較して大幅なパフォーマンスの向上を示しています。

Recent advancements in vision foundation models (VFMs) have opened up new possibilities for versatile and efficient visual perception. In this work, we introduce Seal, a novel framework that harnesses VFMs for segmenting diverse automotive point cloud sequences. Seal exhibits three appealing properties: i) Scalability: VFMs are directly distilled into point clouds, eliminating the need for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial and temporal relationships are enforced at both the camera-to-LiDAR and point-to-segment stages, facilitating cross-modal representation learning. iii) Generalizability: Seal enables knowledge transfer in an off-the-shelf manner to downstream tasks involving diverse point clouds, including those from real/synthetic, low/high-resolution, large/small-scale, and clean/corrupted datasets. Extensive experiments conducted on eleven different point cloud datasets showcase the effectiveness and superiority of Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear probing, surpassing random initialization by 36.9% mIoU and outperforming prior arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains over existing methods across 20 different few-shot fine-tuning tasks on all eleven tested point cloud datasets.

updated: Thu Jun 15 2023 17:59:54 GMT+0000 (UTC)

published: Thu Jun 15 2023 17:59:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト