An Image-based Approach of Task-driven Driving Scene Categorization

Shaochi Hu; Hanwei Fan; Biao Gao; XijunZhao; Huijing Zhao

タスク駆動型運転シーン分類の画像ベースのアプローチ

視覚による運転シーンの分類は、自動運転と自動運転車の下流作業を安全に行うための重要なテクノロジーです。従来の方法では、シーン関連のオブジェクトを検出するか、細かいラベルの付いたシーン画像の大規模なデータセットでトレーニングされた分類子を使用して、シーンのカテゴリを推測します。一方、キャンパスや公園などの雑然としたダイナミックなシーンでは、人間の活動はルールによって強く制限されておらず、場所の機能属性はオブジェクトと強く相関していません。したがって、シーンカテゴリを定義、モデル化、および推測する方法は、ロボットがシーンを通過するのを支援するのにこの手法を非常に役立つようにするために重要です。本論文は、弱教師ありデータを用いたタスク駆動型運転シーン分類の方法を提案した。運転シーンのフロントビュービデオが与えられると、アンカーポイントのセットは、人間のドライバーの意思決定に従うことによってマークされます。アンカーポイントはセマンティックラベルではなく、シーンのセマンティック属性が前のもののそれ。対照学習を介して異なる意味属性のシーンを区別するための尺度が学習され、その尺度に基づいて運転シーンのプロファイリングおよび分類方法が開発されます。実験は、車両が北京大学の雑然としたダイナミックなキャンパスを通過したときに記録された正面のビデオで行われます。シーンは、直線道路、曲がりくねった道路、警戒交通に分類されます。セマンティックシーン類似性学習と運転シーン分類の結果は広く研究されており、シーン分類の肯定的な結果は、学習ビデオで97.17％、新しいシーンのビデオで85.44％です。

Categorizing driving scenes via visual perception is a key technology for safe driving and the downstream tasks of autonomous vehicles. Traditional methods infer scene category by detecting scene-related objects or using a classifier that is trained on large datasets of fine-labeled scene images. Whereas at cluttered dynamic scenes such as campus or park, human activities are not strongly confined by rules, and the functional attributes of places are not strongly correlated with objects. So how to define, model and infer scene categories is crucial to make the technique really helpful in assisting a robot to pass through the scene. This paper proposes a method of task-driven driving scene categorization using weakly supervised data. Given a front-view video of a driving scene, a set of anchor points is marked by following the decision making of a human driver, where an anchor point is not a semantic label but an indicator meaning the semantic attribute of the scene is different from that of the previous one. A measure is learned to discriminate the scenes of different semantic attributes via contrastive learning, and a driving scene profiling and categorization method is developed based on that measure. Experiments are conducted on a front-view video that is recorded when a vehicle passed through the cluttered dynamic campus of Peking University. The scenes are categorized into straight road, turn road and alerting traffic. The results of semantic scene similarity learning and driving scene categorization are extensively studied, and positive result of scene categorization is 97.17 % on the learning video and 85.44% on the video of new scenes.

updated: Wed Mar 10 2021 08:23:36 GMT+0000 (UTC)

published: Wed Mar 10 2021 08:23:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト