Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours from Multimodal Videos

Shehroz S. Khan; Ziting Shen; Haoying Sun; Ax Patel; Ali Abedi

マルチモーダルビデオから異常な運転行動を検出するための監視された対照的な学習

脇見運転は、車両事故の主な理由の1つです。したがって、注意散漫な運転行動を検出することは、世界中で発生する何百万もの死傷者を減らすために最も重要です。気が散るまたは異常な運転行動は、ドライバーに警告するために正しく識別される必要がある「通常の」運転からの逸脱です。ただし、これらの運転行動は1つの特定のタイプの運転スタイルを構成するものではなく、分類器のトレーニング段階とテスト段階でそれらの分布が異なる可能性があります。この問題を、正常な、および見られた、または見られない異常な運転行動を検出するための視覚的表現を学習するための監視された対照的な学習アプローチとして定式化します。最適化を支援するために、負のペアの類似性を調整するために、標準の対比損失関数に変更を加えました。通常、（自己）監視された対照的なフレームワークでは、エンコード層には一般的な視覚的代表情報が含まれていると見なされるため、テストフェーズでは投影ヘッド層が省略されます。ただし、ビデオベースの監視された対照的な学習タスクには、プロジェクションヘッドを含めることが有益である可能性があると断言します。さまざまなトップカメラとフロントカメラ（深度と赤外線の両方）からの31人のドライバーの正常および異常な運転行動の783分のビデオ録画を含むドライバー異常検出データセットで結果を示しました。 9つのビデオモダリティの組み合わせのうち、提案された対照的なアプローチは、ベースラインモデルと比較して6のROC AUCを改善しました（さまざまなモダリティで4.23％から8.91％に）。提案した方法がベースラインの対照的な学習設定よりも優れていることを示す統計的検定を実行しました。最後に、結果は、上面図と正面図からの深度と赤外線モダリティの融合が、0.9738の最高のAUCROCと0.9772のAUCPRを達成したことを示しました。

Distracted driving is one of the major reasons for vehicle accidents. Therefore, detecting distracted driving behaviors is of paramount importance to reduce the millions of deaths and injuries occurring worldwide. Distracted or anomalous driving behaviors are deviations from 'normal' driving that need to be identified correctly to alert the driver. However, these driving behaviors do not comprise one specific type of driving style and their distribution can be different during the training and test phases of a classifier. We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviors. We made a change to the standard contrastive loss function to adjust the similarity of negative pairs to aid the optimization. Normally, in a (self) supervised contrastive framework, the projection head layers are omitted during the test phase as the encoding layers are considered to contain general visual representative information. However, we assert that for a video-based supervised contrastive learning task, including a projection head can be beneficial. We showed our results on a driver anomaly detection dataset that contains 783 minutes of video recordings of normal and anomalous driving behaviors of 31 drivers from the various top and front cameras (both depth and infrared). Out of 9 video modalities combinations, our proposed contrastive approach improved the ROC AUC on 6 in comparison to the baseline models (from 4.23% to 8.91% for different modalities). We performed statistical tests that showed evidence that our proposed method performs better than the baseline contrastive learning setup. Finally, the results showed that the fusion of depth and infrared modalities from the top and front views achieved the best AUC ROC of 0.9738 and AUC PR of 0.9772.

updated: Fri Apr 29 2022 13:12:26 GMT+0000 (UTC)

published: Thu Sep 09 2021 03:50:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト