A Hybrid Spatial-temporal Sequence-to-one Neural Network Model for Lane Detection

Yongqi Dong; Sandeep Patil; Bart van Arem; Haneen Farah

車線検出のためのハイブリッド時空間シーケンス対1ニューラルネットワークモデル

信頼性が高く正確な車線検出は、車線維持支援システムと車線逸脱警報システムの安全なパフォーマンスにとって非常に重要です。ただし、特定の困難な特殊な状況（たとえば、マーキングの劣化、深刻な車両の閉塞）では、現在の文献でよくあることですが、単一の画像からレーンマーキングを正確に検出するのに十分なパフォーマンスを得るのは困難です。道路標示は道路上の実線であるため、現在の画像フレームで正確に検出することが困難な車線は、前のフレームからの情報が組み込まれている場合、より適切に推測される可能性があります。このために、画像の連続シーケンスの複数のフレームの時空間情報を最大限に活用して、最後の現在の画像フレームのレーンマーキングを検出する、新しいハイブリッド時空間シーケンス対1の深層学習アーキテクチャを提案します。具体的には、ハイブリッドモデルは、単一の画像で空間的特徴と関係を抽出するのに強力な空間畳み込みニューラルネットワーク（SCNN）と、時空間相関をキャプチャできる畳み込み長短期記憶（ConvLSTM）ニューラルネットワークを統合します。画像シーケンス間の時間依存性。提案されたモデルアーキテクチャでは、SCNNとConvLSTMの両方の利点が完全に組み合わされ、時空間情報が十分に活用されます。レーン検出を画像セグメンテーションの問題として扱い、エンコーダーデコーダー構造を適用して、エンドツーエンドで機能するようにしました。 2つの大規模なデータセットでの広範な実験により、提案されたモデルは困難な運転シーンを効果的に処理でき、以前の最先端の方法よりも優れていることが明らかになりました。

Reliable and accurate lane detection is of vital importance for the safe performance of Lane Keeping Assistance and Lane Departure Warning systems. However, under certain challenging peculiar circumstances (e.g., marking degradation, serious vehicle occlusion), it is difficult to get satisfactory performance in accurately detecting the lane markings from one single image which is often the case in current literature. Since road markings are continuous lines on the road, the lanes that are difficult to be accurately detected in the current image frame might potentially be better inferred out if information from previous frames is incorporated. For this, we propose a novel hybrid spatial-temporal sequence-to-one deep learning architecture making full use of the spatial-temporal information in multiple frames of a continuous sequence of images to detect lane markings in the very last current image frame. Specifically, the hybrid model integrates the spatial convolutional neural network (SCNN), which is powerful in extracting spatial features and relationships in one single image, with convolutional long-short term memory (ConvLSTM) neural network, which can capture the spatial-temporal correlations and time dependencies among the image sequences. With the proposed model architecture, the advantages of both SCNN and ConvLSTM are fully combined and the spatial-temporal information is fully exploited. Treating lane detection as the image segmentation problem, we applied encoder-decoder structures to make it work in an end-to-end way. Extensive experiments on two large-scale datasets reveal that our proposed model can effectively handle challenging driving scenes and outperforms previous state-of-the-art methods.

updated: Tue Oct 05 2021 15:47:45 GMT+0000 (UTC)

published: Tue Oct 05 2021 15:47:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト