Review of Video Predictive Understanding: Early Action Recognition and Future Action Prediction

He Zhao; Richard P. Wildes

ビデオ予測理解のレビュー：早期行動認識と将来行動予測

ビデオ予測の理解には、現在および過去のビデオ観測からの観測されていない未来の予測に関係する幅広い取り組みが含まれます。アクション予測は、ビデオ予測理解の主要なサブエリアであり、このレビューの焦点です。このサブエリアには、早期の行動認識と将来の行動予測という2つの主要な細分化があります。早期の行動認識は、進行中の行動をできるだけ早く認識することに関係しています。将来の行動予測は、以前に観察された行動に続く行動の予測に関係しています。いずれの場合も、過去、現在、および潜在的な将来の情報間の因果関係が主な焦点です。マルコフ連鎖、ガウス過程、自己回帰モデリング、ベイジアン再帰フィルタリングなどのさまざまな数学的ツールが、これら2つのタスクのコンピュータービジョン技術と組み合わせて広く採用されています。ただし、これらのアプローチは、次元の呪い、不十分な一般化、ドメイン固有の知識による制約などの課題に直面しています。最近、深い畳み込みニューラルネットワークとリカレントニューラルネットワークに依存する構造が、一般に既存のビジョンタスク、特にアクション予測タスクのパフォーマンスを改善するために広く提案されています。ただし、それらには独自の欠点があります。たとえば、大量のトレーニングデータへの依存や、強力な理論的基盤の欠如などです。この調査では、ビデオ予測理解の広い領域の主要なサブ領域を紹介することから始めます。これらは最近集中的に注目され、実用的な価値があることが証明されています。次に、さまざまな早期アクション認識および将来のアクション予測アルゴリズムの徹底的なレビューが、適切に編成された部門で提供されます。最後に、今後の研究の方向性について議論を締めくくります。

Video predictive understanding encompasses a wide range of efforts that are concerned with the anticipation of the unobserved future from the current as well as historical video observations. Action prediction is a major sub-area of video predictive understanding and is the focus of this review. This sub-area has two major subdivisions: early action recognition and future action prediction. Early action recognition is concerned with recognizing an ongoing action as soon as possible. Future action prediction is concerned with the anticipation of actions that follow those previously observed. In either case, the causal relationship between the past, current, and potential future information is the main focus. Various mathematical tools such as Markov Chains, Gaussian Processes, Auto-Regressive modeling, and Bayesian recursive filtering are widely adopted jointly with computer vision techniques for these two tasks. However, these approaches face challenges such as the curse of dimensionality, poor generalization, and constraints from domain-specific knowledge. Recently, structures that rely on deep convolutional neural networks and recurrent neural networks have been extensively proposed for improving the performance of existing vision tasks, in general, and action prediction tasks, in particular. However, they have their own shortcomings, e.g. reliance on massive training data and lack of strong theoretical underpinnings. In this survey, we start by introducing the major sub-areas of the broad area of video predictive understanding, which recently have received intensive attention and proven to have practical value. Next, a thorough review of various early action recognition and future action prediction algorithms are provided with suitably organized divisions. Finally, we conclude our discussion with future research directions.

updated: Fri Jul 16 2021 19:34:34 GMT+0000 (UTC)

published: Sun Jul 11 2021 22:46:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト