Contrastive Unsupervised Learning of World Model with Invariant Causal Features

Rudra P. K. Poudel; Harit Pandya; Roberto Cipolla

不変の因果的特徴を持つ世界モデルの対照的な教師なし学習

この論文では、不変性原理を使用して因果的特徴を学習する世界モデルを提示します。特に、対照的な教師なし学習を使用して、不変の因果的特徴を学習します。これにより、観察の無関係な部分またはスタイルの拡張全体で不変性が強制されます。ワールドモデルベースの強化学習手法は、表現学習とポリシーを個別に最適化します。したがって、単純な対照損失の実装は、表現学習モジュールへの監視信号が不足しているため崩壊します。この問題を軽減するために、介入不変の補助タスクを提案します。具体的には、深度予測を利用して不変性を明示的に適用し、データ拡張を RGB 観測空間へのスタイル介入として使用します。私たちの設計は、教師なし表現学習を活用して、不変の因果的特徴を持つ世界モデルを学習します。私たちが提案した方法は、iGibson データセットの分布外ポイントナビゲーションタスクで、現在の最先端のモデルベースおよびモデルフリーの強化学習方法よりも大幅に優れています。さらに、提案されたモデルは、知覚学習モジュールのシムからリアルへの移行に優れています。最後に、DeepMind コントロールスイートでのアプローチを評価し、深さが利用できないため、暗黙的にのみ不変性を適用します。それにもかかわらず、提案されたモデルは、最先端のモデルと同等のパフォーマンスを発揮します。

In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Finally, we evaluate our approach on the DeepMind control suite and enforce invariance only implicitly since depth is not available. Nevertheless, our proposed model performs on par with the state-of-the-art counterpart.

updated: Thu Sep 29 2022 16:49:24 GMT+0000 (UTC)

published: Thu Sep 29 2022 16:49:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト