Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers

Samay Lakhani

ビジョントランスフォーマーを使用した注意散漫で眠気のある運転を特定するための時空間的注意の適用

気晴らしと眠気の増加の結果として、2020年と比較して2021年に自動車事故が20％増加することが観察されています。眠気と注意散漫な運転は、すべての自動車事故の45％の原因です。眠気や脇見運転を減らす手段として、コンピュータービジョンを使用した検出方法は、低コスト、正確、低侵襲になるように設計できます。この作業では、ビジョントランスフォーマーを使用して、3D-CNNの最先端の精度を上回っています。 2つの別々の変圧器が眠気と気晴らしのために訓練されました。眠気のあるビデオトランスフォーマーモデルは、国立清華大学の眠気を伴う運転データセット（NTHU-DDD）で、2つのクラス（眠気と非眠気を10.5時間にわたってシミュレート）で10エポックのビデオスウィントランスフォーマーモデルを使用してトレーニングされました。注意散漫のビデオトランスフォーマーは、9つの注意散漫関連クラスの50エポックでVideoSwinTransformerを使用してDriverMonitoringDataset（DMD）でトレーニングされました。眠気モデルの精度は44％に達し、テストセットで高い損失値を示しました。これは、過剰適合とモデルのパフォーマンスの低下を示しています。過剰適合は、トレーニングデータが限られており、適用されたモデルアーキテクチャに学習するための定量化可能なパラメータが不足していることを示します。注意散漫モデルは、DMDの最新モデルを97.5％に上回り、十分なデータと強力なアーキテクチャを備えているため、変圧器は不適合な運転検出に適していることを示しています。将来の研究では、TokenLearnerなどの新しく強力なモデルを使用して、より高い精度と効率を実現し、既存のデータセットをマージして飲酒運転と道路の怒りを検出し、交通事故を防ぐ包括的なソリューションを作成し、機能するプロトタイプを展開して自動車の安全性に革命を起こす必要があります。業界。

A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness. Drowsy and distracted driving are the cause of 45% of all car crashes. As a means to decrease drowsy and distracted driving, detection methods using computer vision can be designed to be low-cost, accurate, and minimally invasive. This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs. Two separate transformers were trained for drowsiness and distractedness. The drowsy video transformer model was trained on the National Tsing-Hua University Drowsy Driving Dataset (NTHU-DDD) with a Video Swin Transformer model for 10 epochs on two classes -- drowsy and non-drowsy simulated over 10.5 hours. The distracted video transformer was trained on the Driver Monitoring Dataset (DMD) with Video Swin Transformer for 50 epochs over 9 distraction-related classes. The accuracy of the drowsiness model reached 44% and a high loss value on the test set, indicating overfitting and poor model performance. Overfitting indicates limited training data and applied model architecture lacked quantifiable parameters to learn. The distracted model outperformed state-of-the-art models on DMD reaching 97.5%, indicating that with sufficient data and a strong architecture, transformers are suitable for unfit driving detection. Future research should use newer and stronger models such as TokenLearner to achieve higher accuracy and efficiency, merge existing datasets to expand to detecting drunk driving and road rage to create a comprehensive solution to prevent traffic crashes, and deploying a functioning prototype to revolutionize the automotive safety industry.

updated: Fri Jul 22 2022 16:36:48 GMT+0000 (UTC)

published: Fri Jul 22 2022 16:36:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト