Predictive Coding Based Multiscale Network with Encoder-Decoder LSTM for Video Prediction

Chaofan Ling; Junpei Zhong; Weihua Li

ビデオ予測のためのエンコーダーデコーダー LSTM を使用した予測符号化ベースのマルチスケールネットワーク

ここでは、ビデオ予測用のマルチスケール予測モデルを紹介します。その設計は、「予測コーディング」理論と「粗から微」アプローチに触発されています。予測コーディングモデルとして、従来のボトムアップのトレーニングスタイルとは異なり、ボトムアップとトップダウンの情報フローの組み合わせによって更新されます。その利点は、入力情報への依存を減らし、画像を予測および生成する能力を向上させることです。重要なことは、マルチスケールアプローチで達成することです。高レベルのニューロンはより粗い予測 (低解像度) を生成し、低レベルのニューロンはより細かい予測 (高解像度) を生成します。これは、高レベルが低レベルのニューロンの活動を予測する従来の予測コーディングフレームワークとは異なります。予測能力を向上させるために、エンコーダー/デコーダーネットワークを LSTM アーキテクチャに統合し、最終的にエンコードされた高レベルのセマンティック情報を異なるレベル間で共有します。さらに、各ネットワークレベルの出力は RGB 画像であるため、より小さな LSTM 隠れ状態を使用して、必要な隠れ情報のみを保持および更新し、過度に離散的で複雑な空間へのマッピングを回避できます。このようにして、予測の難しさと計算上のオーバーヘッドを減らすことができます。最後に、敵対的トレーニングの不安定性と、長期予測におけるトレーニングとテストの不一致に対処するために、トレーニング戦略をさらに調査します。コードは https://github.com/Ling-CF/MSPN で入手できます。

We are introducing a multi-scale predictive model for video prediction here, whose design is inspired by the "Predictive Coding" theories and "Coarse to Fine" approach. As a predictive coding model, it is updated by a combination of bottom-up and top-down information flows, which is different from traditional bottom-up training style. Its advantage is to reduce the dependence on input information and improve its ability to predict and generate images. Importantly, we achieve with a multi-scale approach -- higher level neurons generate coarser predictions (lower resolution), while the lower level generate finer predictions (higher resolution). This is different from the traditional predictive coding framework in which higher level predict the activity of neurons in lower level. To improve the predictive ability, we integrate an encoder-decoder network in the LSTM architecture and share the final encoded high-level semantic information between different levels. Additionally, since the output of each network level is an RGB image, a smaller LSTM hidden state can be used to retain and update the only necessary hidden information, avoiding being mapped to an overly discrete and complex space. In this way, we can reduce the difficulty of prediction and the computational overhead. Finally, we further explore the training strategies, to address the instability in adversarial training and mismatch between training and testing in long-term prediction. Code is available at https://github.com/Ling-CF/MSPN.

updated: Thu Dec 22 2022 12:15:37 GMT+0000 (UTC)

published: Thu Dec 22 2022 12:15:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト