S-HR-VQVAE: Sequential Hierarchical Residual Learning Vector Quantized Variational Autoencoder for Video Prediction

Mohammad Adiban; Kalin Stefanov; Sabato Marco Siniscalchi; Giampiero Salvi

S-HR-VQVAE: ビデオ予測用の逐次階層残差学習ベクトル量子化変分オートエンコーダー

我々は、(i) 最近提案した階層残差ベクトル量子化変分オートエンコーダ (HR-VQVAE) と、(ii) 新しい時空間 PixelCNN (ST-PixelCNN) を組み合わせた新しいモデルを提案することで、ビデオ予測タスクに取り組みます。このアプローチを逐次階層残差学習ベクトル量子化変分オートエンコーダー (S-HR-VQVAE) と呼びます。 HR-VQVAE の本質的な機能を活用して静止画像を倹約表現でモデリングし、ST-PixelCNN の時空間情報を処理する能力と組み合わせることで、S-HR-VQVAE はビデオ予測における主要な課題に適切に対処できます。これらには、時空間情報の学習、高次元データの処理、不鮮明な予測への対処、物理的特性の暗黙的なモデリングが含まれます。 KTH Human Action および Moving-MNIST タスクに関する広範な実験結果は、モデルサイズがはるかに小さいにもかかわらず、私たちのモデルが定量的および定性的評価の両方でトップのビデオ予測技術と比べて有利に匹敵することを示しています。最後に、HR-VQVAE パラメーターと ST-PixelCNN パラメーターを共同推定する新しいトレーニング方法を提案することで、S-HR-VQVAE を強化します。

We address the video prediction task by putting forth a novel model that combines (i) our recently proposed hierarchical residual vector quantized variational autoencoder (HR-VQVAE), and (ii) a novel spatiotemporal PixelCNN (ST-PixelCNN). We refer to this approach as a sequential hierarchical residual learning vector quantized variational autoencoder (S-HR-VQVAE). By leveraging the intrinsic capabilities of HR-VQVAE at modeling still images with a parsimonious representation, combined with the ST-PixelCNN's ability at handling spatiotemporal information, S-HR-VQVAE can better deal with chief challenges in video prediction. These include learning spatiotemporal information, handling high dimensional data, combating blurry prediction, and implicit modeling of physical characteristics. Extensive experimental results on the KTH Human Action and Moving-MNIST tasks demonstrate that our model compares favorably against top video prediction techniques both in quantitative and qualitative evaluations despite a much smaller model size. Finally, we boost S-HR-VQVAE by proposing a novel training method to jointly estimate the HR-VQVAE and ST-PixelCNN parameters.

updated: Thu Jul 13 2023 11:58:27 GMT+0000 (UTC)

published: Thu Jul 13 2023 11:58:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト