Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Anas Mahmoud; Jordan S. K. Hu; Tianshu Kuai; Ali Harakeh; Liam Paull; Steven L. Waslander

セマンティックトレラントコントラストロスによる自己管理型のイメージからポイントへの蒸留

知覚タスクの 3D 表現を学習するための効果的なフレームワークは、対照的な学習によって豊富な自己教師あり画像の特徴を抽出することです。ただし、自動運転データセットの画像からポイントへの表現学習は、2 つの主な課題に直面しています。1) 自己類似性の豊富さ。その結果、意味的に類似したポイントと画像の領域を押しのける対照的な損失が発生し、学習されたローカルの意味構造が乱れます。表現、および 2) 事前トレーニングが過剰に表現されたクラスによって支配されるため、深刻なクラスの不均衡。正と負の画像領域間の意味距離を考慮して、意味的に類似したポイントと画像領域のコントラストを最小限に抑える、新しい意味的に寛容な画像からポイントへのコントラスト損失により、自己相似性の問題を軽減することを提案します。さらに、サンプル間の意味的類似性測定の集計を通じてクラスの不均衡の程度を概算する、クラスに依存しない平衡損失を設計することにより、クラスの不均衡に対処します。クラスバランシングを使用したセマンティックトレラントなコントラスト損失により、3D セマンティックセグメンテーションのすべての評価設定で最先端の 2D から 3D への表現学習が改善されることを示します。私たちの方法は、幅広い 2D 自己教師あり事前トレーニング済みモデルにわたって、最先端の 2D から 3D への表現学習フレームワークより一貫して優れています。

An effective framework for learning 3D representations for perception tasks is distilling rich self-supervised image features via contrastive learning. However, image-to point representation learning for autonomous driving datasets faces two main challenges: 1) the abundance of self-similarity, which results in the contrastive losses pushing away semantically similar point and image regions and thus disturbing the local semantic structure of the learned representations, and 2) severe class imbalance as pretraining gets dominated by over-represented classes. We propose to alleviate the self-similarity problem through a novel semantically tolerant image-to-point contrastive loss that takes into consideration the semantic distance between positive and negative image regions to minimize contrasting semantically similar point and image regions. Additionally, we address class imbalance by designing a class-agnostic balanced loss that approximates the degree of class imbalance through an aggregate sample-to-samples semantic similarity measure. We demonstrate that our semantically-tolerant contrastive loss with class balancing improves state-of-the art 2D-to-3D representation learning in all evaluation settings on 3D semantic segmentation. Our method consistently outperforms state-of-the-art 2D-to-3D representation learning frameworks across a wide range of 2D self-supervised pretrained models.

updated: Thu Jan 12 2023 19:58:54 GMT+0000 (UTC)

published: Thu Jan 12 2023 19:58:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト