Automatic Map Update Using Dashcam Videos

Aziza Zhanabatyrova; Clayton Souza Leite; Yu Xiao

Dashcamビデオを使用した自動地図更新

自動運転には、セマンティックランドマークに関する正確で最新の情報を提供する3Dマップが必要です。レーザースキャナーと比較してカメラの可用性が高く、コストが低いため、ビジョンベースのマッピングソリューション、特にクラウドソーシングされたビジュアルデータを使用するソリューションは、学界や産業界から大きな注目を集めています。ただし、これまでの作業は主に3D点群の作成に焦点を当てており、自動変更検出を未解決の問題として残しています。この論文では、メタデータ（交通標識の種類や場所など）の比較に基づく自動変更検出に焦点を当てて、ダッシュカムビデオを使用して3Dマップを開始および更新するためのパイプラインを提案します。 3Dオブジェクトの検出とローカリゼーションの精度に依存するメタデータ生成のパフォーマンスを向上させるために、新しい深層学習ベースのピクセル単位の3Dローカリゼーションアルゴリズムを導入します。 SfM点群データで直接トレーニングされたアルゴリズムは、単眼画像からの深さだけでなく、横方向および高さの距離も推定することにより、3D空間内の2D画像から検出されたオブジェクトを高精度で見つけることができます。さらに、エラーに対するシステムの堅牢性を向上させるために、ポイントクラスタリングおよびしきい値処理アルゴリズムも提案します。キャンパスと住宅地の2つの異なるエリアで、さまざまなタイプのカメラ、照明、気象条件を使用して実験を行いました。変化は、キャンパスと住宅地でそれぞれ85％と100％の精度で検出されました。キャンパスエリアのエラーは、主に車両から遠く離れた場所から見た交通標識が原因で、歩行者と自転車にのみ使用されていました。また、使用中のバックグラウンドテクノロジーのパフォーマンスによる影響を測定するために、検出エラーとローカリゼーションエラーの原因分析を実施しました。

Autonomous driving requires 3D maps that provide accurate and up-to-date information about semantic landmarks. Due to the wider availability and lower cost of cameras compared with laser scanners, vision-based mapping solutions, especially the ones using crowdsourced visual data, have attracted much attention from academia and industry. However, previous works have mainly focused on creating 3D point clouds, leaving automatic change detection as open issues. We propose in this paper a pipeline for initiating and updating 3D maps with dashcam videos, with a focus on automatic change detection based on comparison of metadata (e.g., the types and locations of traffic signs). To improve the performance of metadata generation, which depends on the accuracy of 3D object detection and localization, we introduce a novel deep learning-based pixel-wise 3D localization algorithm. The algorithm, trained directly with SfM point cloud data, can locate objects detected from 2D images in a 3D space with high accuracy by estimating not only depth from monocular images but also lateral and height distances. In addition, we also propose a point clustering and thresholding algorithm to improve the robustness of the system to errors. We have performed experiments on two distinct areas - a campus and a residential area - with different types of cameras, lighting, and weather conditions. The changes were detected with 85% and 100% accuracy in the campus and residential areas, respectively. The errors in the campus area were mainly due to traffic signs seen from a far distance to the vehicle and intended for pedestrians and cyclists only. We also conducted cause analysis of the detection and localization errors to measure the impact from the performance of the background technology in use.

updated: Tue Jan 18 2022 21:59:21 GMT+0000 (UTC)

published: Fri Sep 24 2021 18:00:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト