StereoVoxelNet: Real-Time Obstacle Detection Based on Occupancy Voxels from a Stereo Camera Using Deep Neural Networks

Hongyu Li; Zhengang Li; Neset Unver Akmandor; Huaizu Jiang; Yanzhi Wang; Taskin Padir

StereoVoxelNet: ディープニューラルネットワークを使用したステレオカメラからの占有ボクセルに基づくリアルタイムの障害物検出

障害物検出は、ステレオマッチングが一般的な視覚ベースのアプローチであるロボットナビゲーションにおける安全上重要な問題です。ディープニューラルネットワークはコンピュータービジョンで目覚ましい成果を上げていますが、これまでの障害物検出のほとんどは、従来のステレオマッチング手法を利用して、リアルタイムフィードバックの計算上の制約を満たすだけでした。この論文では、ディープニューラルネットワークを使用してステレオ画像から占有率を直接検出する、計算効率の高い方法を提案します。ステレオデータから点群対応を学習する代わりに、私たちのアプローチは、体積表現に基づいてコンパクトな障害物分布を抽出します。さらに、デコーダによって生成された octrees に基づいて、粗から細かい方法で、安全性に関係のない空間の計算を削除します。その結果、オンボードコンピューター (NVIDIA Jetson TX2) でリアルタイムのパフォーマンスを実現します。私たちのアプローチは、32 メートルの範囲で障害物を正確に検出し、最先端のステレオモデルのわずか 2% の計算コストで、より優れた IoU (Intersection over Union) および CD (Chamfer Distance) スコアを達成します。さらに、実際のロボットを使用した自律航法実験を通じて、この方法のロバスト性と現実世界での実現可能性を検証します。したがって、私たちの研究は、ロボットの知覚におけるステレオベースのシステムとコンピュータービジョンにおける最先端のステレオモデルとの間のギャップを埋めることに貢献しています。高品質の現実世界の屋内ステレオデータセットの不足に対処するために、モデルの微調整に使用されるモバイルロボットを使用して 1.36 時間のステレオデータセットを収集します。データセット、コード、および追加の視覚化を含む詳細は、https://lhy.xyz/stereovoxelnet で入手できます。

Obstacle detection is a safety-critical problem in robot navigation, where stereo matching is a popular vision-based approach. While deep neural networks have shown impressive results in computer vision, most of the previous obstacle detection works only leverage traditional stereo matching techniques to meet the computational constraints for real-time feedback. This paper proposes a computationally efficient method that employs a deep neural network to detect occupancy from stereo images directly. Instead of learning the point cloud correspondence from the stereo data, our approach extracts the compact obstacle distribution based on volumetric representations. In addition, we prune the computation of safety irrelevant spaces in a coarse-to-fine manner based on octrees generated by the decoder. As a result, we achieve real-time performance on the onboard computer (NVIDIA Jetson TX2). Our approach detects obstacles accurately in the range of 32 meters and achieves better IoU (Intersection over Union) and CD (Chamfer Distance) scores with only 2% of the computation cost of the state-of-the-art stereo model. Furthermore, we validate our method's robustness and real-world feasibility through autonomous navigation experiments with a real robot. Hence, our work contributes toward closing the gap between the stereo-based system in robot perception and state-of-the-art stereo models in computer vision. To counter the scarcity of high-quality real-world indoor stereo datasets, we collect a 1.36 hours stereo dataset with a mobile robot which is used to fine-tune our model. The dataset, the code, and further details including additional visualizations are available at https://lhy.xyz/stereovoxelnet

updated: Sat Mar 04 2023 18:18:54 GMT+0000 (UTC)

published: Sun Sep 18 2022 03:32:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト