MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying Motions

Haixu Wu; Zhiyu Yao; Mingsheng Long; Jianmin Wan

MotionRNN：時空間で変化するモーションを使用したビデオ予測の柔軟なモデル

この論文は、時空の両方で絶え間なく変化する時空変動運動を予測するという新しい次元からのビデオ予測に取り組んでいます。以前の方法は主に時間的状態遷移をキャプチャしますが、モーション自体の複雑な時空間的変化を見落とし、絶えず変化するモーションに適応することを困難にします。物理的な世界の動きは、一時的な変化と動きの傾向に分解できますが、後者は以前の動きの蓄積と見なすことができます。したがって、過渡変動とモーショントレンドを同時にキャプチャすることが、時空間変動モーションをより予測可能にするための鍵となります。これらの観察に基づいて、モーション内の複雑な変動をキャプチャし、時空変動シナリオに適応できるMotionRNNフレームワークを提案します。 MotionRNNには2つの主な貢献があります。 1つ目は、過渡変動とモーショントレンドを統一的にモデル化できるMotionGRUユニットを設計することです。 2つ目は、MotionGRUをRNNベースの予測モデルに適用し、モーションハイウェイを備えた新しい柔軟なビデオ予測アーキテクチャを示します。これにより、変更可能なモーションを予測する機能が大幅に向上し、積み重ねられた多層予測モデルのモーションの消失を回避できます。高い柔軟性を備えたこのフレームワークは、決定論的な時空間予測のための一連のモデルに適応できます。当社のMotionRNNは、時空間で変化するモーションを使用したビデオ予測の3つの難しいベンチマークを大幅に改善できます。

This paper tackles video prediction from a new dimension of predicting spacetime-varying motions that are incessantly changing across both space and time. Prior methods mainly capture the temporal state transitions but overlook the complex spatiotemporal variations of the motion itself, making them difficult to adapt to ever-changing motions. We observe that physical world motions can be decomposed into transient variation and motion trend, while the latter can be regarded as the accumulation of previous motions. Thus, simultaneously capturing the transient variation and the motion trend is the key to make spacetime-varying motions more predictable. Based on these observations, we propose the MotionRNN framework, which can capture the complex variations within motions and adapt to spacetime-varying scenarios. MotionRNN has two main contributions. The first is that we design the MotionGRU unit, which can model the transient variation and motion trend in a unified way. The second is that we apply the MotionGRU to RNN-based predictive models and indicate a new flexible video prediction architecture with a Motion Highway that can significantly improve the ability to predict changeable motions and avoid motion vanishing for stacked multiple-layer predictive models. With high flexibility, this framework can adapt to a series of models for deterministic spatiotemporal prediction. Our MotionRNN can yield significant improvements on three challenging benchmarks for video prediction with spacetime-varying motions.

updated: Wed Mar 03 2021 08:11:50 GMT+0000 (UTC)

published: Wed Mar 03 2021 08:11:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト