TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration

Kunyu Peng; Alina Roitberg; Kailun Yang; Jiaming Zhang; Rainer Stiefelhagen

TransDARC：潜在空間機能キャリブレーションを使用したトランスベースのドライバーアクティビティ認識

従来のビデオベースの人間活動認識は、深層学習の台頭に関連して目覚ましい進歩を遂げましたが、ドライバーの行動を理解するという下流のタスクに関しては、この効果は遅くなりました。先進運転支援システム（ADAS）は、注意散漫を識別し、ドライバーの意図を予測し、より便利な人と車の相互作用につながるため、車室内の状況を理解することは不可欠です。同時に、ドライバー監視システムは、ドライバーの状態のさまざまな粒度をキャプチャする必要があるため、大きな障害に直面します。そのような二次的な活動の複雑さは、自動化の進展とドライバーの自由度の向上とともに増大します。さらに、センサーの配置とタイプは車両ごとに異なるため、トレーニングセットと同じ条件下でモデルが展開されることはめったになく、データ駆動型モデルの実際の展開にとって大きな障害となります。この作業では、ビジュアルトランスフォーマーと追加の拡張機能分布キャリブレーションモジュールに基づいて、セカンダリドライバーの動作を認識するための新しいビジョンベースのフレームワークを紹介します。このモジュールは、潜在的な機能空間で動作し、機能レベルでトレーニングセットを強化および多様化して、新しいデータの外観（センサーの変更など）および一般的な機能の品質への一般化を改善します。私たちのフレームワークは一貫してより良い認識率につながり、すべての粒度レベルでパブリックDrive＆Actベンチマークの以前の最先端の結果を上回ります。私たちのコードはhttps://github.com/KPeng9510/TransDARCで公開されます。

Traditional video-based human activity recognition has experienced remarkable progress linked to the rise of deep learning, but this effect was slower as it comes to the downstream task of driver behavior understanding. Understanding the situation inside the vehicle cabin is essential for Advanced Driving Assistant System (ADAS) as it enables identifying distraction, predicting driver's intent and leads to more convenient human-vehicle interaction. At the same time, driver observation systems face substantial obstacles as they need to capture different granularities of driver states, while the complexity of such secondary activities grows with the rising automation and increased driver freedom. Furthermore, a model is rarely deployed under conditions identical to the ones in the training set, as sensor placements and types vary from vehicle to vehicle, constituting a substantial obstacle for real-life deployment of data-driven models. In this work, we present a novel vision-based framework for recognizing secondary driver behaviours based on visual transformers and an additional augmented feature distribution calibration module. This module operates in the latent feature-space enriching and diversifying the training set at feature-level in order to improve generalization to novel data appearances, (e.g., sensor changes) and general feature quality. Our framework consistently leads to better recognition rates, surpassing previous state-of-the-art results of the public Drive&Act benchmark on all granularity levels. Our code will be made publicly available at https://github.com/KPeng9510/TransDARC.

updated: Wed Mar 02 2022 08:14:06 GMT+0000 (UTC)

published: Wed Mar 02 2022 08:14:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト