3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Rui Qiu; Ming Xu; Yuyao Yan; Jeremy S. Smith; Xi Yang

深いマルチカメラ歩行者ローカリゼーションのための3Dランダムオクルージョンとマルチレイヤープロジェクション

単眼歩行者検出のための深層学習ベースの方法は大きな進歩を遂げましたが、それでも重い閉塞に対して脆弱です。マルチビュー情報融合の使用は潜在的なソリューションですが、既存のマルチビューデータセットに注釈付きのトレーニングサンプルがないため、アプリケーションが制限されており、過剰適合のリスクが高まります。この問題に対処するために、データ拡張方法を提案して、歩行者の平均サイズで複数のビューに投影された3D円柱オクルージョンを地面にランダムに生成し、トレーニングでの過剰適合の影響を軽減します。さらに、各ビューのフィーチャマップは、ホモグラフィを使用して、異なる高さの複数の平行平面に投影されます。これにより、CNNは、各歩行者の高さ全体のフィーチャを十分に活用して、地面上の歩行者の位置を推測できます。提案された3DROM手法は、マルチビュー歩行者検出のための最先端の深層学習ベースの手法と比較して、パフォーマンスが大幅に向上しています。

Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

updated: Mon Jul 25 2022 17:27:35 GMT+0000 (UTC)

published: Fri Jul 22 2022 06:15:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト