A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic Accidents

Muhammad Monjurul Karim; Yu Li; Ruwen Qin; Zhaozheng Yin

交通事故の早期予測のための動的な時空間アテンションネットワーク

センサー技術と人工知能の急速な進歩は、交通安全を強化するための新しい機会を生み出しています。ダッシュボードカメラ（ダッシュカメラ）は、人間の運転車両と自動運転車両の両方に広く導入されています。ドライブレコーダーのビデオから事故を正確かつ迅速に予測できる計算知能モデルは、事故防止の準備を強化します。トラフィックエージェントの時空間相互作用は複雑です。将来の事故を予測するための視覚的な手がかりは、ドライブレコーダーのビデオデータに深く埋め込まれています。したがって、交通事故の早期予測は依然として課題です。事故のリスクを視覚的に認識する際の人間の注意行動に触発されて、この論文は、ダッシュカムビデオからの早期の事故予測のための動的時空間注意（DSTA）ネットワークを提案します。 DSTAネットワークは、Dynamic Temporal Attention（DTA）モジュールを使用して、ビデオシーケンスの識別可能な時間セグメントを選択することを学習します。また、Dynamic Spatial Attention（DSA）モジュールを使用して、フレームの有益な空間領域に焦点を当てることも学習します。ゲート付き回帰ユニット（GRU）は、将来の事故の可能性を予測するために、注意モジュールと共同でトレーニングされます。 2つのベンチマークデータセットでのDSTAネットワークの評価により、DSTAネットワークが最先端のパフォーマンスを超えていることが確認されます。コンポーネントレベルでDSTAネットワークを評価する徹底的なアブレーション調査により、ネットワークがそのようなパフォーマンスをどのように達成するかが明らかになります。さらに、この論文は、2つの補完的なモデルからの予測スコアを融合する方法を提案し、早期の事故予測のパフォーマンスをさらに高めることにおけるその有効性を検証します。

The rapid advancement of sensor technologies and artificial intelligence are creating new opportunities for traffic safety enhancement. Dashboard cameras (dashcams) have been widely deployed on both human driving vehicles and automated driving vehicles. A computational intelligence model that can accurately and promptly predict accidents from the dashcam video will enhance the preparedness for accident prevention. The spatial-temporal interaction of traffic agents is complex. Visual cues for predicting a future accident are embedded deeply in dashcam video data. Therefore, the early anticipation of traffic accidents remains a challenge. Inspired by the attention behavior of humans in visually perceiving accident risks, this paper proposes a Dynamic Spatial-Temporal Attention (DSTA) network for the early accident anticipation from dashcam videos. The DSTA-network learns to select discriminative temporal segments of a video sequence with a Dynamic Temporal Attention (DTA) module. It also learns to focus on the informative spatial regions of frames with a Dynamic Spatial Attention (DSA) module. A Gated Recurrent Unit (GRU) is trained jointly with the attention modules to predict the probability of a future accident. The evaluation of the DSTA-network on two benchmark datasets confirms that it has exceeded the state-of-the-art performance. A thorough ablation study that assesses the DSTA-network at the component level reveals how the network achieves such performance. Furthermore, this paper proposes a method to fuse the prediction scores from two complementary models and verifies its effectiveness in further boosting the performance of early accident anticipation.

updated: Tue Dec 21 2021 00:43:09 GMT+0000 (UTC)

published: Fri Jun 18 2021 15:58:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト