Learning Sequential Latent Variable Models from Multimodal Time Series Data

Oliver Limoyo; Trevor Ablett; Jonathan Kelly

マルチモーダル時系列データからの順次潜在変数モデルの学習

高次元データのシーケンシャルモデリングは、モデルベースの強化学習や制御のためのダイナミクス識別など、多くの分野で発生する重要な問題です。シーケンシャルデータに適用される潜在変数モデル（つまり、潜在ダイナミクスモデル）は、特に画像を処理する場合に、この問題を解決するための特に効果的な確率的アプローチであることが示されています。ただし、多くのアプリケーション分野（ロボット工学など）では、複数のセンシングモダリティからの情報が利用可能です。既存の潜在ダイナミクス手法は、このようなマルチモーダルシーケンシャルデータを効果的に利用するためにまだ拡張されていません。マルチモーダルセンサーストリームは、有用な方法で相関させることができ、多くの場合、モダリティ全体で補完的な情報が含まれています。この作業では、マルチモーダルデータとそれぞれのダイナミクスの確率的潜在状態表現を共同で学習するための自己監視生成モデリングフレームワークを提示します。マルチモーダルロボット平面プッシュタスクからの合成および実世界のデータセットを使用して、私たちのアプローチが予測と表現の品質の大幅な改善につながることを示します。さらに、潜在空間で各モダリティを連結する一般的な学習ベースラインと比較し、原理的な確率的定式化のパフォーマンスが優れていることを示します。最後に、完全に自己監視されているにもかかわらず、私たちの方法は、グラウンドトゥルースラベルに依存する既存の監視ありアプローチとほぼ同じくらい効果的であることを示しています。

Sequential modelling of high-dimensional data is an important problem that appears in many domains including model-based reinforcement learning and dynamics identification for control. Latent variable models applied to sequential data (i.e., latent dynamics models) have been shown to be a particularly effective probabilistic approach to solve this problem, especially when dealing with images. However, in many application areas (e.g., robotics), information from multiple sensing modalities is available -- existing latent dynamics methods have not yet been extended to effectively make use of such multimodal sequential data. Multimodal sensor streams can be correlated in a useful manner and often contain complementary information across modalities. In this work, we present a self-supervised generative modelling framework to jointly learn a probabilistic latent state representation of multimodal data and the respective dynamics. Using synthetic and real-world datasets from a multimodal robotic planar pushing task, we demonstrate that our approach leads to significant improvements in prediction and representation quality. Furthermore, we compare to the common learning baseline of concatenating each modality in the latent space and show that our principled probabilistic formulation performs better. Finally, despite being fully self-supervised, we demonstrate that our method is nearly as effective as an existing supervised approach that relies on ground truth labels.

updated: Thu Apr 21 2022 21:59:24 GMT+0000 (UTC)

published: Thu Apr 21 2022 21:59:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト