KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection

Yadan Luo; Zhuoxiao Chen; Zhen Fang; Zheng Zhang; Zi Huang; Mahsa Baktashmotlagh

KECOR: アクティブ 3D オブジェクト検出のためのカーネルコーディングレートの最大化

自動運転において信頼性の高い LiDAR ベースの物体検出器を実現することは最も重要ですが、その成功は大量の正確な 3D アノテーションを取得できるかどうかにかかっています。アクティブラーニング (AL) は、使用するラベルの数を減らし、完全教師あり学習に匹敵するパフォーマンスを達成できるアルゴリズムを通じてアノテーションの負担を軽減することを目指しています。 AL は有望であることが示されていますが、現在のアプローチでは、不確実性や多様性が高いラベルのない点群の選択が優先されるため、ラベル付け用のインスタンスがより多く選択され、計算効率が低下します。この論文では、情報理論のレンズを通してラベルを取得するために最も有益な点群を特定することを目的とした新しいカーネル符号化率最大化 (KECOR) 戦略に頼ります。貪欲検索は、潜在特徴をエンコードするために必要な最小ビット数を最大化できる目的の点群を探すために適用されます。モデルの観点から選択されたサンプルの一意性と有益性を判断するために、3D 検出器ヘッドのプロキシネットワークを構築し、すべてのプロキシ層からヤコビアンの外積を計算して経験的ニューラルタンジェントカーネル (NTK) マトリックスを形成します。 1 段階 (つまり SECOND) と 2 段階の検出器 (つまり PVRCNN) の両方に対応するために、分類エントロピーの最大化と、検出パフォーマンスとアノテーション用に選択された境界ボックスの総数の間の適切なトレードオフをさらに組み込みます。 2 つの 3D ベンチマークと 2D 検出データセットに対して行われた広範な実験により、提案されたアプローチの優位性と多用途性が実証されました。私たちの結果は、検出パフォーマンスを損なうことなく、最先端の AL 手法と比較して、ボックスレベルのアノテーションコストが約 44%、計算時間が約 26% 削減されることを示しています。

Achieving a reliable LiDAR-based object detector in autonomous driving is paramount, but its success hinges on obtaining large amounts of precise 3D annotations. Active learning (AL) seeks to mitigate the annotation burden through algorithms that use fewer labels and can attain performance comparable to fully supervised learning. Although AL has shown promise, current approaches prioritize the selection of unlabeled point clouds with high uncertainty and/or diversity, leading to the selection of more instances for labeling and reduced computational efficiency. In this paper, we resort to a novel kernel coding rate maximization (KECOR) strategy which aims to identify the most informative point clouds to acquire labels through the lens of information theory. Greedy search is applied to seek desired point clouds that can maximize the minimal number of bits required to encode the latent features. To determine the uniqueness and informativeness of the selected samples from the model perspective, we construct a proxy network of the 3D detector head and compute the outer product of Jacobians from all proxy layers to form the empirical neural tangent kernel (NTK) matrix. To accommodate both one-stage (i.e., SECOND) and two-stage detectors (i.e., PVRCNN), we further incorporate the classification entropy maximization and well trade-off between detection performance and the total number of bounding boxes selected for annotation. Extensive experiments conducted on two 3D benchmarks and a 2D detection dataset evidence the superiority and versatility of the proposed approach. Our results show that approximately 44% box-level annotation costs and 26% computational time are reduced compared to the state-of-the-art AL method, without compromising detection performance.

updated: Sun Jul 16 2023 04:27:03 GMT+0000 (UTC)

published: Sun Jul 16 2023 04:27:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト