Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

Yiming Ma; Victor Sanchez; Soodeh Nikan; Devesh Upadhyay; Bhushan Atote; Tanaya Guha

Masked Multi-Head Self-Attention を使用したロバストなマルチビューマルチモーダルドライバーモニタリングシステム

ドライバーモニタリングシステム (DMS) は、レベル 2 以上の自動運転車の安全な引き継ぎに不可欠です。最先端の DMS は、さまざまな場所に取り付けられた複数のセンサーを活用して、ドライバーと車内のシーンを監視し、意思決定レベルの融合を採用して、これらの異種データを統合します。ただし、この融合方法では、異なるデータソースの補完性を十分に活用できず、それらの相対的な重要性を見落とす可能性があります。これらの制限に対処するために、マルチヘッド自己注意 (MHSA) による機能レベルの融合に基づく、新しいマルチビューマルチモーダルドライバー監視システムを提案します。 4 つの代替融合戦略 (Sum、Conv、SE、および AFF) と比較することにより、その有効性を示します。また、より優れた表現を学習するための新しい GPU フレンドリーな教師あり対照学習フレームワーク SuMoCo も紹介します。さらに、ドライバーのアクティビティのマルチクラス認識を可能にするために、DAD データセットのテスト分割を細かく調整しました。この強化されたデータベースでの実験は、1) 提案された MHSA ベースの融合法 (AUC-ROC: 97.0%) がすべてのベースラインおよび以前のアプローチよりも優れていること、および 2) パッチマスキングを使用して MHSA をトレーニングすると、モダリティ/ビューの崩壊に対する堅牢性を向上できることを示しています。コードと注釈は公開されています。

Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions in Level-2+ self-driving vehicles. State-of-the-art DMSs leverage multiple sensors mounted at different locations to monitor the driver and the vehicle's interior scene and employ decision-level fusion to integrate these heterogenous data. However, this fusion method may not fully utilize the complementarity of different data sources and may overlook their relative importance. To address these limitations, we propose a novel multiview multimodal driver monitoring system based on feature-level fusion through multi-head self-attention (MHSA). We demonstrate its effectiveness by comparing it against four alternative fusion strategies (Sum, Conv, SE, and AFF). We also present a novel GPU-friendly supervised contrastive learning framework SuMoCo to learn better representations. Furthermore, We fine-grained the test split of the DAD dataset to enable the multi-class recognition of drivers' activities. Experiments on this enhanced database demonstrate that 1) the proposed MHSA-based fusion method (AUC-ROC: 97.0%) outperforms all baselines and previous approaches, and 2) training MHSA with patch masking can improve its robustness against modality/view collapses. The code and annotations are publicly available.

updated: Thu Apr 13 2023 09:50:32 GMT+0000 (UTC)

published: Thu Apr 13 2023 09:50:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト