Pedestrian Detection: Domain Generalization, CNNs, Transformers and Beyond

Irtiza Hasan; Shengcai Liao; Jinpeng Li; Saad Ullah Akram; Ling Shao

歩行者検出：ドメインの一般化、CNN、トランスフォーマーなど

歩行者検出は、オブジェクトトラッキングからビデオ監視、そして最近では自動運転に至るまで、多くの視覚ベースのアプリケーションの基礎となっています。オブジェクト検出における深層学習の急速な発展により、歩行者検出は、従来の単一データセットのトレーニングおよび評価設定で非常に優れたパフォーマンスを達成しました。ただし、一般化可能な歩行者検出器に関するこの研究では、現在の歩行者検出器は、データセット間の評価における小さなドメインシフトでさえもうまく処理できないことを示しています。限定された一般化は、方法と現在のデータソースという2つの主な要因に起因すると考えられます。この方法に関して、現在の歩行者検出器の設計の選択（アンカー設定など）に存在するバイアスが、限定された一般化の主な要因であることを示します。最新の歩行者検出器のほとんどは、ターゲットデータセットに合わせて調整されており、従来の単一のトレーニングおよびテストパイプラインで高いパフォーマンスを実現しますが、データセット間の評価で評価するとパフォーマンスが低下します。その結果、一般的なオブジェクト検出器は、その一般的な設計により、最先端の歩行者検出器と比較して、データセット間の評価で優れたパフォーマンスを発揮します。データに関しては、自動運転のベンチマークは本質的に単調である、つまり、シナリオが多様ではなく、歩行者が密集していることを示しています。したがって、Web（多様で高密度のシナリオを含む）をクロールすることによってキュレートされたベンチマークは、より堅牢な表現を提供するための事前トレーニングの効率的なソースです。したがって、一般化を改善するプログレッシブ微調整戦略を提案します。 https://github.com/hasanirtiza/Pedestronでアクセスされるコードとモデルのタクシー。

Pedestrian detection is the cornerstone of many vision based applications, starting from object tracking to video surveillance and more recently, autonomous driving. With the rapid development of deep learning in object detection, pedestrian detection has achieved very good performance in traditional single-dataset training and evaluation setting. However, in this study on generalizable pedestrian detectors, we show that, current pedestrian detectors poorly handle even small domain shifts in cross-dataset evaluation. We attribute the limited generalization to two main factors, the method and the current sources of data. Regarding the method, we illustrate that biasness present in the design choices (e.g anchor settings) of current pedestrian detectors are the main contributing factor to the limited generalization. Most modern pedestrian detectors are tailored towards target dataset, where they do achieve high performance in traditional single training and testing pipeline, but suffer a degrade in performance when evaluated through cross-dataset evaluation. Consequently, a general object detector performs better in cross-dataset evaluation compared with state of the art pedestrian detectors, due to its generic design. As for the data, we show that the autonomous driving benchmarks are monotonous in nature, that is, they are not diverse in scenarios and dense in pedestrians. Therefore, benchmarks curated by crawling the web (which contain diverse and dense scenarios), are an efficient source of pre-training for providing a more robust representation. Accordingly, we propose a progressive fine-tuning strategy which improves generalization. Code and models cab accessed at https://github.com/hasanirtiza/Pedestron.

updated: Mon Jan 10 2022 06:00:26 GMT+0000 (UTC)

published: Mon Jan 10 2022 06:00:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト