AutoScale: Learning to Scale for Crowd Counting and Localization

Chenfeng Xu; Dingkang Liang; Yongchao Xu; Song Bai; Wei Zhan; Xiang Bai; Masayoshi Tomizuka

AutoScale：群集のカウントとローカリゼーションのためのスケーリングの学習

群集カウントに関する最近の研究では、主にCNNを利用して密度マップを回帰してカウントし、大きな進歩を遂げています。密度マップでは、各人はガウスブロブで表され、最終的なカウントはマップ全体の統合から取得されます。ただし、密集した領域の密度マップを正確に予測することは困難です。主要な問題は、密集した領域の密度マップは通常、近くのガウスブロブの数から密度値を蓄積し、小さなピクセルセットで異なる大きな密度値を生成することです。これにより、密度マップは大幅なパターンシフトを伴うバリアントパターンを提示し、ピクセル単位の密度値のロングテール分布をもたらします。シンプルで効果的なLearningto Scale（L2S）モジュールを提案します。このモジュールは、密集した領域を適切な近接レベルに自動的にスケーリングします（隣接する人々の間の画像平面の距離を反映します）。 L2Sは、オーバーラップしたブロブを動的に分離し、グラウンドトゥルース密度マップの累積値を分解して、パターンシフトと密度値のロングテール分布を軽減するように、さまざまなパッチの近さを直接正規化します。これは、モデルが密度マップをよりよく学習するのに役立ちます。また、量子化された距離の極小値を見つけることにより、人のローカライズにおけるL2Sの有効性を調査します（人の位置マップ）。私たちの知る限りでは、このようなローカリゼーション方法は、ローカリゼーションベースの群集カウントでも斬新です。さらに、カスタマイズされた動的クロスエントロピー損失を導入し、ローカリゼーションベースのモデル最適化を大幅に改善します。広範な実験により、AutoScaleと呼ばれる提案されたフレームワークは、3つの混雑したデータセットの回帰ベンチマークとローカリゼーションベンチマークの両方でいくつかの最先端の方法を改善し、2つのスパースデータセットで非常に競争力のあるパフォーマンスを達成することが示されています。

Recent works on crowd counting mainly leverage CNNs to count by regressing density maps, and have achieved great progress. In the density map, each person is represented by a Gaussian blob, and the final count is obtained from the integration of the whole map. However, it is difficult to accurately predict the density map on dense regions. A major issue is that the density map on dense regions usually accumulates density values from a number of nearby Gaussian blobs, yielding different large density values on a small set of pixels. This makes the density map present variant patterns with significant pattern shifts and brings a long-tailed distribution of pixel-wise density values. We propose a simple and effective Learning to Scale (L2S) module, which automatically scales dense regions into reasonable closeness levels (reflecting image-plane distance between neighboring people). L2S directly normalizes the closeness in different patches such that it dynamically separates the overlapped blobs, decomposes the accumulated values in the ground-truth density map, and thus alleviates the pattern shifts and long-tailed distribution of density values. This helps the model to better learn the density map. We also explore the effectiveness of L2S in localizing people by finding the local minima of the quantized distance (w.r.t. person location map). To the best of our knowledge, such a localization method is also novel in localization-based crowd counting. We further introduce a customized dynamic cross-entropy loss, significantly improving the localization-based model optimization. Extensive experiments demonstrate that the proposed framework termed AutoScale improves upon some state-of-the-art methods in both regression and localization benchmarks on three crowded datasets and achieves very competitive performance on two sparse datasets.

updated: Tue Oct 19 2021 03:52:18 GMT+0000 (UTC)

published: Fri Dec 20 2019 03:54:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト