Perspective-aware Convolution for Monocular 3D Object Detection

Jia-Quan Yu; Soo-Chang Pei

単眼 3D オブジェクト検出のための遠近感を意識した畳み込み

単眼の 3D オブジェクト検出は自動運転車にとって重要かつ困難なタスクですが、シーン内の 3D オブジェクトを推測するために 1 台のカメラ画像のみが使用されます。絵の手掛かりのみを使用して深度を予測することの難しさに対処するために、画像内の長距離依存関係を捕捉する新しい遠近感を意識した畳み込み層を提案します。畳み込みカーネルを強制的に実行して、すべての画像ピクセルの深度軸に沿って特徴を抽出することで、遠近感情報をネットワークアーキテクチャに組み込みます。遠近感を考慮した畳み込み層を 3D オブジェクト検出器に統合し、KITTI3D データセットでのパフォーマンスの向上を実証し、簡単なベンチマークで平均 23.9% の精度を達成しました。これらの結果は、正確な深度推論のためにシーンの手がかりをモデリングすることの重要性を強調し、ネットワーク設計にシーン構造を組み込む利点を強調しています。遠近感を意識した畳み込み層は、より正確でコンテキストを意識した特徴抽出を提供することで、オブジェクト検出の精度を向上させる可能性があります。

Monocular 3D object detection is a crucial and challenging task for autonomous driving vehicle, while it uses only a single camera image to infer 3D objects in the scene. To address the difficulty of predicting depth using only pictorial clue, we propose a novel perspective-aware convolutional layer that captures long-range dependencies in images. By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture. We integrate our perspective-aware convolutional layer into a 3D object detector and demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark. These results underscore the importance of modeling scene clues for accurate depth inference and highlight the benefits of incorporating scene structure in network design. Our perspective-aware convolutional layer has the potential to enhance object detection accuracy by providing more precise and context-aware feature extraction.

updated: Thu Aug 24 2023 17:25:36 GMT+0000 (UTC)

published: Thu Aug 24 2023 17:25:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト