On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

Nicklas Hansen; Zhecheng Yuan; Yanjie Ze; Tongzhou Mu; Aravind Rajeswaran; Hao Su; Huazhe Xu; Xiaolong Wang

視覚運動制御の事前トレーニングについて: ゼロから学習するベースラインの再検討

この論文では、視覚運動制御タスクの事前トレーニングの有効性を検討します。データ拡張と浅い ConvNet を組み込んだシンプルな Learning-from-Scratch (LfS) ベースラインを再検討し、このベースラインが大規模でトレーニングされた凍結された視覚表現を活用する最近のアプローチ (PVR、MVP、R3M) と驚くほど競争力があることを発見しました。ビジョンデータセット -- シミュレーションおよび実際のロボット上のさまざまなアルゴリズム、タスクドメイン、メトリクスにわたる。私たちの結果は、これらの方法が、トレーニング前のデータセットと視覚運動制御の現在のベンチマークとの間の大きな領域ギャップによって妨げられていることを示していますが、このギャップは微調整によって緩和されます。私たちの調査結果に基づいて、制御のための事前トレーニングにおける今後の研究への推奨事項を提供し、シンプルかつ強力なベースラインがこの分野の進捗状況を正確にベンチマークするのに役立つことを期待しています。

In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a variety of algorithms, task domains, and metrics in simulation and on a real robot. Our results demonstrate that these methods are hindered by a significant domain gap between the pre-training datasets and current benchmarks for visuo-motor control, which is alleviated by finetuning. Based on our findings, we provide recommendations for future research in pre-training for control and hope that our simple yet strong baseline will aid in accurately benchmarking progress in this area.

updated: Thu Jun 15 2023 17:57:21 GMT+0000 (UTC)

published: Mon Dec 12 2022 07:59:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト