PreCNet: Next-Frame Video Prediction Based on Predictive Coding

Zdenek Straka; Tomas Svoboda; Matej Hoffmann

PreCNet: 予測コーディングに基づく次のフレームのビデオ予測

現在、神経科学で非常に影響力のある理論である予測コーディングは、機械学習ではまだ広く採用されていません。この作業では、Rao と Ballard (1999) の独創的なモデルを、元のスキーマに最大限忠実に保ちながら、最新のディープラーニングフレームワークに変換します。私たちが提案する結果として得られるネットワーク (PreCNet) は、広く使用されている次のフレームのビデオ予測ベンチマークでテストされます。これは、車載カメラから記録された都市環境の画像で構成され、最先端のパフォーマンスを実現します。より大きなトレーニングセット (BDD100k からの 2M 画像) を使用すると、すべての測定 (MSE、PSNR、SSIM) のパフォーマンスがさらに向上し、KITTI トレーニングセットの限界が明らかになりました。この研究は、目前のタスクに明示的に調整されていなくても、神経科学モデルに注意深く基づいたアーキテクチャが並外れたパフォーマンスを発揮できることを示しています。

Predictive coding, currently a highly influential theory in neuroscience, has not been widely adopted in machine learning yet. In this work, we transform the seminal model of Rao and Ballard (1999) into a modern deep learning framework while remaining maximally faithful to the original schema. The resulting network we propose (PreCNet) is tested on a widely used next frame video prediction benchmark, which consists of images from an urban environment recorded from a car-mounted camera, and achieves state-of-the-art performance. Performance on all measures (MSE, PSNR, SSIM) was further improved when a larger training set (2M images from BDD100k), pointing to the limitations of the KITTI training set. This work demonstrates that an architecture carefully based in a neuroscience model, without being explicitly tailored to the task at hand, can exhibit exceptional performance.

updated: Wed Feb 08 2023 11:50:42 GMT+0000 (UTC)

published: Thu Apr 30 2020 15:31:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト