A Novel Driver Distraction Behavior Detection Method Based on Self-supervised Learning with Masked Image Modeling

Yingzhi Zhang; Taiguo Li; Chao Li; Xinghong Zhou

マスク画像モデリングによる自己教師あり学習に基づく新しいドライバーの注意散漫行動検出手法

ドライバーの注意散漫により毎年かなりの数の交通事故が発生し、経済的損失や死傷者が発生しています。現在、商用車の自動化レベルは完全な無人化には程遠く、ドライバーは依然として車両の操作と制御において重要な役割を果たしています。したがって、ドライバーの注意散漫行動の検出は交通安全にとって非常に重要です。現在、ドライバーの注意散漫の検出は主に従来の畳み込みニューラルネットワーク (CNN) と教師あり学習手法に依存しています。ただし、ラベル付きデータセットの高コスト、高レベルのセマンティック情報を取得する能力の制限、汎化パフォーマンスの弱さなどの課題がまだあります。これらの問題を解決するために、本論文では、ドライバーの注意散漫行動検出のためのマスク画像モデリングに基づく新しい自己教師あり学習方法を提案する。まず、データセットのラベル付けによって引き起こされる深刻な人的および物的消費の問題を解決するために、マスクされたイメージモデリング (MIM) のための自己教師あり学習フレームワークが導入されます。次に、エンコーダとして Swin Transformer を採用しています。 Swin Transformer ブロックを再構成し、すべてのステージにわたるウィンドウマルチヘッドセルフアテンション (W-MSA) およびシフトウィンドウマルチヘッドセルフアテンション (SW-MSA) 検出ヘッドの数の配分を調整することで、パフォーマンスが向上します。モデルのさらなる軽量化につながります。最後に、モデルの認識と汎化能力を強化するために、さまざまなデータ拡張戦略が最適なランダムマスキング戦略とともに使用されます。大規模なドライバーの注意散漫行動データセットに対するテスト結果は、この論文で提案された自己教師あり学習方法が 99.60% の精度を達成し、高度な教師あり学習方法の優れたパフォーマンスに近似していることを示しています。私たちのコードは github.com/Rocky1salady-killer/SL-DDBD で公開されています。

Driver distraction causes a significant number of traffic accidents every year, resulting in economic losses and casualties. Currently, the level of automation in commercial vehicles is far from completely unmanned, and drivers still play an important role in operating and controlling the vehicle. Therefore, driver distraction behavior detection is crucial for road safety. At present, driver distraction detection primarily relies on traditional convolutional neural networks (CNN) and supervised learning methods. However, there are still challenges such as the high cost of labeled datasets, limited ability to capture high-level semantic information, and weak generalization performance. In order to solve these problems, this paper proposes a new self-supervised learning method based on masked image modeling for driver distraction behavior detection. Firstly, a self-supervised learning framework for masked image modeling (MIM) is introduced to solve the serious human and material consumption issues caused by dataset labeling. Secondly, the Swin Transformer is employed as an encoder. Performance is enhanced by reconfiguring the Swin Transformer block and adjusting the distribution of the number of window multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA) detection heads across all stages, which leads to model more lightening. Finally, various data augmentation strategies are used along with the best random masking strategy to strengthen the model's recognition and generalization ability. Test results on a large-scale driver distraction behavior dataset show that the self-supervised learning method proposed in this paper achieves an accuracy of 99.60%, approximating the excellent performance of advanced supervised learning methods. Our code is publicly available at github.com/Rocky1salady-killer/SL-DDBD.

updated: Thu Jul 13 2023 14:47:42 GMT+0000 (UTC)

published: Thu Jun 01 2023 10:53:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト