CDN-MEDAL: Two-stage Density and Difference Approximation Framework for Motion Analysis

Synh Viet-Uyen Ha; Cuong Tien Nguyen; Hung Ngoc Phan; Nhat Minh Chung; Phuong Hoai Ha

CDN-MEDAL：モーション分析のための2段階の密度と差の近似フレームワーク

バックグラウンドモデリングとサブトラクションは、ビデオ監視のさまざまなアプリケーションを備えた有望な研究分野です。近年、この分野で効果的な学習ベースのディープニューラルネットワークが急増しています。ただし、これらの手法では、シーンのプロパティの限定的な説明しか提供されておらず、大量の計算が必要です。これは、単一値のマッピング関数が、観測されたターゲットの背景と前景の時間的条件付き平均を近似するように学習されるためです。一方、画像ドメインでの統計学習は、特に一般化機能を備えたガウス混合モデル（GMM）を使用して、動的コンテキスト変換に高度に適応する一般的なアプローチです。両方を活用することにより、2つの畳み込みニューラルネットワークを使用したバックグラウンドモデリングと減算のためのCDN-MEDAL-netと呼ばれる新しい方法を提案します。最初のアーキテクチャであるCDN-GMは、監視されていないGMM統計学習戦略に基づいており、観測されたシーンの顕著な特徴を記述します。 2番目のMEDAL-netは、オンラインビデオバックグラウンド減算の軽量パイプラインを実装します。 2段階のアーキテクチャは小さいですが、複雑なモーションパターンの表現にすばやく収束するので非常に効果的です。我々の実験は、提案されたアプローチが目に見えない場合に動く物体の領域を効果的に抽出することができるだけでなく、それはまた非常に効率的であることを示している。

Background modeling and subtraction is a promising research area with a variety of applications for video surveillance. Recent years have witnessed a proliferation of effective learning-based deep neural networks in this area. However, the techniques have only provided limited descriptions of scenes' properties while requiring heavy computations, as their single-valued mapping functions are learned to approximate the temporal conditional averages of observed target backgrounds and foregrounds. On the other hand, statistical learning in imagery domains has been a prevalent approach with high adaptation to dynamic context transformation, notably using Gaussian Mixture Models (GMM) with its generalization capabilities. By leveraging both, we propose a novel method called CDN-MEDAL-net for background modeling and subtraction with two convolutional neural networks. The first architecture, CDN-GM, is grounded on an unsupervised GMM statistical learning strategy to describe observed scenes' salient features. The second one, MEDAL-net, implements a light-weighted pipeline of online video background subtraction. Our two-stage architecture is small, but it is very effective with rapid convergence to representations of intricate motion patterns. Our experiments show that the proposed approach is not only capable of effectively extracting regions of moving objects in unseen cases, but it is also very efficient.

updated: Tue Sep 21 2021 11:49:47 GMT+0000 (UTC)

published: Mon Jun 07 2021 16:39:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト