Video Reenactment as Inductive Bias for Content-Motion Disentanglement

Juan F. Hernández Albarracín; Adín Ramírez Rivera

コンテンツモーションの解きほぐしのための誘導バイアスとしてのビデオの再現

低次元表現内の独立したコンポーネントは、いくつかのダウンストリームタスクで不可欠な入力であり、観測されたデータに関する説明を提供します。ビデオベースの解きほぐされた変動要因は、タスク固有のモデルをフィードするために識別および使用できる低次元の表現を提供します。ビデオからモーションとコンテンツを解きほぐすための自己監視モーション転送VAEモデルであるMTC-VAEを紹介します。ビデオコンテンツ-モーションの解きほぐしに関する以前の作業とは異なり、チャンクワイズモデリングアプローチを採用し、時空間近傍に含まれるモーション情報を利用します。私たちのモデルは、時間的な一貫性を維持する独立したチャンクごとの表現を生成します。したがって、単一のフォワードパスでビデオ全体を再構築します。 ELBOの対数尤度項を拡張し、モーション機能を交換すると2つのビデオ間で再現が得られるという仮定の下で、モーションの解きほぐしを活用するための誘導バイアスとしてブラインド再現損失を含めます。最近提案された解きほぐしメトリックを使用してモデルを評価し、ビデオモーションコンテンツの解きほぐしのさまざまな方法よりも優れていることを示します。ビデオの再現に関する実験は、モデルが再構成の品質とモーションアラインメントのベースラインを上回っている入力空間での解きほぐしの有効性を示しています。

Independent components within low-dimensional representations are essential inputs in several downstream tasks, and provide explanations over the observed data. Video-based disentangled factors of variation provide low-dimensional representations that can be identified and used to feed task-specific models. We introduce MTC-VAE, a self-supervised motion-transfer VAE model to disentangle motion and content from videos. Unlike previous work on video content-motion disentanglement, we adopt a chunk-wise modeling approach and take advantage of the motion information contained in spatiotemporal neighborhoods. Our model yields independent per-chunk representations that preserve temporal consistency. Hence, we reconstruct whole videos in a single forward-pass. We extend the ELBO's log-likelihood term and include a Blind Reenactment Loss as an inductive bias to leverage motion disentanglement, under the assumption that swapping motion features yields reenactment between two videos. We evaluate our model with recently-proposed disentanglement metrics and show that it outperforms a variety of methods for video motion-content disentanglement. Experiments on video reenactment show the effectiveness of our disentanglement in the input space where our model outperforms the baselines in reconstruction quality and motion alignment.

updated: Fri Feb 18 2022 21:43:49 GMT+0000 (UTC)

published: Sat Jan 30 2021 22:07:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト