For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal

Yingdong Hu; Renhao Wang; Li Erran Li; Yang Gao

運動制御における事前訓練済みの視覚モデルの場合、すべてのポリシー学習方法が同じように作成されるわけではありません

近年、事前に訓練された視覚モデルを運動制御に活用することに注目が集まっています。既存の研究では主にこのトレーニング前段階の重要性が強調されていますが、制御固有の微調整中に下流のポリシー学習が果たす重要な役割は、しばしば無視されています。したがって、事前にトレーニングされたビジョンモデルが、さまざまな制御ポリシーの下で有効性において一貫しているかどうかは不明のままです。理解におけるこのギャップを埋めるために、強化学習 (RL)、行動クローニング (BC) による模倣学習、およびビジュアルを使用した模倣学習を含む、ポリシー学習メソッドの 3 つの異なるクラスを使用して、14 の事前トレーニング済みビジョンモデルに関する包括的な研究を実施します。報酬関数 (VRF)。私たちの研究は、事前トレーニングの有効性が下流のポリシー学習アルゴリズムの選択に大きく依存するという発見を含む、一連の興味深い結果をもたらしました。 RL メソッドに基づいて従来受け入れられていた評価は非常に可変的であり、したがって信頼できないことを示し、VRF や BC などのより堅牢なメソッドを使用することをさらに提唱します。将来、事前トレーニング済みモデルとそのポリシー学習方法のより普遍的な評価を促進するために、作業と並行して、3 つの異なる環境にわたる 21 のタスクのベンチマークもリリースします。

In recent years, increasing attention has been directed to leveraging pre-trained vision models for motor control. While existing works mainly emphasize the importance of this pre-training phase, the arguably equally important role played by downstream policy learning during control-specific fine-tuning is often neglected. It thus remains unclear if pre-trained vision models are consistent in their effectiveness under different control policies. To bridge this gap in understanding, we conduct a comprehensive study on 14 pre-trained vision models using 3 distinct classes of policy learning methods, including reinforcement learning (RL), imitation learning through behavior cloning (BC), and imitation learning with a visual reward function (VRF). Our study yields a series of intriguing results, including the discovery that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm. We show that conventionally accepted evaluation based on RL methods is highly variable and therefore unreliable, and further advocate for using more robust methods like VRF and BC. To facilitate more universal evaluations of pre-trained models and their policy learning methods in the future, we also release a benchmark of 21 tasks across 3 different environments alongside our work.

updated: Mon Apr 10 2023 13:52:19 GMT+0000 (UTC)

published: Mon Apr 10 2023 13:52:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト