Interpretable Deep Feature Propagation for Early Action Recognition

He Zhao; Richard P. Wildes

早期行動認識のための解釈可能な深部特徴伝播

ビデオアクションは多くの場合、望ましくない遅延を引き起こす長い時間スパンを持っているため、限られた予備観測からの早期アクション認識（アクション予測）は、リアルタイムの推論を必要とするストリーミングビジョンシステムにとって重要な役割を果たします。この研究では、行動パターンが空間的特徴空間で時間とともにどのように進化するかを調査することにより、行動予測に取り組みます。私たちのシステムには3つの重要なコンポーネントがあります。まず、空間レイアウトを維持しながら、生データからの抽象化を可能にする中間層のConvNet機能を使用します。次に、特徴自体を伝播する代わりに、残差を時間全体に伝播します。これにより、冗長性を低減するコンパクトな表現が可能になります。第3に、カルマンフィルターを使用して、エラーの蓄積に対抗し、予測開始時間全体で統一します。複数のベンチマークに関する広範な実験結果は、私たちのアプローチが行動予測において競争力のあるパフォーマンスにつながることを示しています。特に、システムの学習したコンポーネントを調査して、2つの方法で不透明な性質に光を当てます。まず、学習した特徴伝播モジュールが畳み込み下の空間シフトメカニズムとして機能し、現在の観測値を将来に伝播することを文書化します。したがって、フローベースの画像モーション情報をキャプチャします。第二に、学習されたカルマンフィルターは、シーケンス学習プロセスを支援するために事前推定を適応的に更新します。

Early action recognition (action prediction) from limited preliminary observations plays a critical role for streaming vision systems that demand real-time inference, as video actions often possess elongated temporal spans which cause undesired latency. In this study, we address action prediction by investigating how action patterns evolve over time in a spatial feature space. There are three key components to our system. First, we work with intermediate-layer ConvNet features, which allow for abstraction from raw data, while retaining spatial layout. Second, instead of propagating features per se, we propagate their residuals across time, which allows for a compact representation that reduces redundancy. Third, we employ a Kalman filter to combat error build-up and unify across prediction start times. Extensive experimental results on multiple benchmarks show that our approach leads to competitive performance in action prediction. Notably, we investigate the learned components of our system to shed light on their otherwise opaque natures in two ways. First, we document that our learned feature propagation module works as a spatial shifting mechanism under convolution to propagate current observations into the future. Thus, it captures flow-based image motion information. Second, the learned Kalman filter adaptively updates prior estimation to aid the sequence learning process.

updated: Sun Jul 11 2021 19:40:19 GMT+0000 (UTC)

published: Sun Jul 11 2021 19:40:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト