CDN-MEDAL: Two-stage Density and Difference Approximation Framework for Motion Analysis

Nguyen-Tien Cuong; Hung Ngoc Phan; Nhat Minh Chung; Phuong Hoai Ha; Synh Viet-Uyen Ha

CDN-MEDAL: モーション分析のための 2 段階の密度と差分近似フレームワーク

バックグラウンドモデリングは、さまざまなビデオ監視アプリケーションを使用したビデオ分析における有望な研究分野です。近年、モーション分析における効果的な学習ベースのアプローチを介したディープニューラルネットワークの普及が目撃されています。ただし、これらの手法は、対象の背景の時間的条件付き平均を近似するために単一値のマッピングが学習される場合、観察されたシーンの不十分なプロパティの限られた説明しか提供しません。一方、画像領域での統計的学習は、動的コンテキスト変換、特に前景抽出ステップと組み合わせたガウス混合モデルへの高度な適応を備えた最も一般的なアプローチの 1 つになっています。この作業では、2 つの畳み込みニューラルネットワークを使用した、新しい 2 段階の変化検出方法を提案します。最初のアーキテクチャは、シーンの顕著な特徴を記述するための教師なし混合ガウス統計学習に基づいています。 2 つ目は、フォアグラウンド検出の軽量のパイプラインを実装します。私たちの 2 段階のフレームワークには、合計で約 3.5K のパラメーターが含まれていますが、それでも複雑なモーションパターンへの迅速な収束を維持しています。公開されているデータセットに関する私たちの実験は、提案されたネットワークが、目に見えない場合に移動するオブジェクトの領域を有望な結果で一般化できるだけでなく、パフォーマンス効率と前景セグメンテーションに関する有効性においても競争力があることを示しています。

Background modeling is a promising research area in video analysis with a variety of video surveillance applications. Recent years have witnessed the proliferation of deep neural networks via effective learning-based approaches in motion analysis. However, these techniques only provide a limited description of the observed scenes' insufficient properties where a single-valued mapping is learned to approximate the temporal conditional averages of the target background. On the other hand, statistical learning in imagery domains has become one of the most prevalent approaches with high adaptation to dynamic context transformation, notably Gaussian Mixture Models, combined with a foreground extraction step. In this work, we propose a novel, two-stage method of change detection with two convolutional neural networks. The first architecture is grounded on the unsupervised Gaussian mixtures statistical learning to describe the scenes' salient features. The second one implements a light-weight pipeline of foreground detection. Our two-stage framework contains approximately 3.5K parameters in total but still maintains rapid convergence to intricate motion patterns. Our experiments on publicly available datasets show that our proposed networks are not only capable of generalizing regions of moving objects in unseen cases with promising results but also are competitive in performance efficiency and effectiveness regarding foreground segmentation.

updated: Mon Sep 13 2021 06:14:13 GMT+0000 (UTC)

published: Mon Jun 07 2021 16:39:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト