Counting People by Estimating People Flows

Weizhe Liu; Mathieu Salzmann; Pascal Fua

人の流れを推定して人を数える

混雑したシーンで人を数える最新の方法は、個々の画像の人の密度を推定するために深いネットワークに依存しています。そのため、ビデオシーケンスの時間的一貫性を利用するのはごくわずかであり、連続するフレーム全体に弱い滑らかさの制約を課すだけのものです。この論文では、連続する画像間の画像位置を横切る人の流れを推定し、それらを直接回帰するのではなく、これらの流れから人の密度を推測することを提唱します。これにより、人数の節約をエンコードするはるかに強力な制約を課すことができます。その結果、より複雑なアーキテクチャを必要とせずに、パフォーマンスが大幅に向上します。さらに、人の流れとオプティカルフローの相関関係を利用して、結果をさらに改善することができます。また、空間的および時間的方法の両方で人々の保護の制約を活用することで、はるかに少ない注釈でアクティブラーニング設定で深い群集カウントモデルをトレーニングできることを示します。これにより、注釈のコストが大幅に削減されますが、完全な監視の場合と同様のパフォーマンスが得られます。

Modern methods for counting people in crowded scenes rely on deep networks to estimate people densities in individual images. As such, only very few take advantage of temporal consistency in video sequences, and those that do only impose weak smoothness constraints across consecutive frames. In this paper, we advocate estimating people flows across image locations between consecutive images and inferring the people densities from these flows instead of directly regressing them. This enables us to impose much stronger constraints encoding the conservation of the number of people. As a result, it significantly boosts performance without requiring a more complex architecture. Furthermore, it allows us to exploit the correlation between people flow and optical flow to further improve the results. We also show that leveraging people conservation constraints in both a spatial and temporal manner makes it possible to train a deep crowd counting model in an active learning setting with much fewer annotations. This significantly reduces the annotation cost while still leading to similar performance to the full supervision case.

updated: Tue Aug 03 2021 14:30:28 GMT+0000 (UTC)

published: Tue Dec 01 2020 12:59:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト