Weakly Supervised Realtime Dynamic Background Subtraction

Fateme Bahri; Nilanjan Ray

弱教師ありリアルタイム動的背景減算

バックグラウンド減算は、物体追跡からビデオ監視に至るまで、数多くの実世界のアプリケーションを備えたコンピュータービジョンの基本的なタスクです。ここでは、ダイナミックな背景が大きな課題となります。教師あり深層学習ベースの手法は、現在、このタスクの最先端と見なされています。ただし、これらの方法ではピクセル単位のグラウンドトゥルースラベルが必要であり、時間と費用がかかる可能性があります。この作業では、ピクセルごとのグラウンドトゥルースラベルを必要とせずにバックグラウンド減算を実行できる、弱く監視されたフレームワークを提案します。私たちのフレームワークは、動くオブジェクトのない一連の画像でトレーニングされており、2 つのネットワークで構成されています。最初のネットワークは、背景画像を生成し、2 番目のネットワークをトレーニングするための動的な背景画像を準備するオートエンコーダーです。動的背景画像は、背景を差し引いた画像をしきい値処理することによって取得されます。 2 番目のネットワークは、同じオブジェクトのないビデオをトレーニングに使用し、動的な背景画像をピクセル単位のグラウンドトゥルースラベルとして使用する U-Net です。テストフェーズでは、入力画像がオートエンコーダーと U-Net によって処理され、それぞれ背景画像と動的背景画像が生成されます。動的な背景画像は、背景を差し引いた画像から動的な動きを取り除くのに役立ち、動的アーティファクトのない前景画像を取得できます。私たちの方法の有効性を実証するために、CDnet 2014 データセットと I2R データセットの選択されたカテゴリで実験を行いました。私たちの方法は、すべてのトップランクの教師なし方法よりも優れていました。また、既存の 2 つの弱教師付きメソッドの 1 つよりも優れた結果を達成し、パフォーマンスは他の方法と同様でした。私たちが提案する方法は、オンライン、リアルタイム、効率的であり、最小限のフレームレベルの注釈しか必要としないため、幅広い実世界のアプリケーションに適しています。

Background subtraction is a fundamental task in computer vision with numerous real-world applications, ranging from object tracking to video surveillance. Dynamic backgrounds poses a significant challenge here. Supervised deep learning-based techniques are currently considered state-of-the-art for this task. However, these methods require pixel-wise ground-truth labels, which can be time-consuming and expensive. In this work, we propose a weakly supervised framework that can perform background subtraction without requiring per-pixel ground-truth labels. Our framework is trained on a moving object-free sequence of images and comprises two networks. The first network is an autoencoder that generates background images and prepares dynamic background images for training the second network. The dynamic background images are obtained by thresholding the background-subtracted images. The second network is a U-Net that uses the same object-free video for training and the dynamic background images as pixel-wise ground-truth labels. During the test phase, the input images are processed by the autoencoder and U-Net, which generate background and dynamic background images, respectively. The dynamic background image helps remove dynamic motion from the background-subtracted image, enabling us to obtain a foreground image that is free of dynamic artifacts. To demonstrate the effectiveness of our method, we conducted experiments on selected categories of the CDnet 2014 dataset and the I2R dataset. Our method outperformed all top-ranked unsupervised methods. We also achieved better results than one of the two existing weakly supervised methods, and our performance was similar to the other. Our proposed method is online, real-time, efficient, and requires minimal frame-level annotation, making it suitable for a wide range of real-world applications.

updated: Mon Mar 06 2023 03:17:48 GMT+0000 (UTC)

published: Mon Mar 06 2023 03:17:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト