Graph Neural Networks for Relational Inductive Bias in Vision-based Deep Reinforcement Learning of Robot Control

Marco Oliva; Soubarna Banik; Josip Josifovski; Alois Knoll

ロボット制御の視覚ベースの深層強化学習における関係誘導バイアスのためのグラフニューラルネットワーク

最先端の強化学習アルゴリズムは、主に数値状態ベクトルまたは画像のいずれかからポリシーを学習します。どちらのアプローチも、一般にタスクの構造的知識を考慮に入れていません。これは、ロボットアプリケーションで特に一般的であり、悪用された場合に学習に役立つ可能性があります。この作業では、リレーショナル誘導バイアスと視覚的フィードバックを組み合わせて、ロボット操作の効率的な位置制御ポリシーを学習するニューラルネットワークアーキテクチャを紹介します。マニピュレータの物理的構造をモデル化し、ロボットの内部状態を画像エンコーディングネットワークによって生成された視覚シーンの低次元の記述と組み合わせたグラフ表現を導き出します。これに基づいて、強化学習で訓練されたグラフニューラルネットワークは、ロボットを制御するための関節速度を予測します。さらに、教師あり学習を使用して、ポリシーとは別に画像エンコーダーをトレーニングする非対称アプローチを紹介します。実験結果は、幾何学的に単純化された2D環境の2-DoF平面ロボットの場合、視覚シーンの学習された表現が、ポリシーの品質とサンプル効率を損なうことなく、到達ターゲットの明示的な座標へのアクセスを置き換えることができることを示しています。さらに、視覚的にリアルな3D環境で6自由度ロボットアームのサンプル効率を改善するモデルの機能を示します。

State-of-the-art reinforcement learning algorithms predominantly learn a policy from either a numerical state vector or images. Both approaches generally do not take structural knowledge of the task into account, which is especially prevalent in robotic applications and can benefit learning if exploited. This work introduces a neural network architecture that combines relational inductive bias and visual feedback to learn an efficient position control policy for robotic manipulation. We derive a graph representation that models the physical structure of the manipulator and combines the robot's internal state with a low-dimensional description of the visual scene generated by an image encoding network. On this basis, a graph neural network trained with reinforcement learning predicts joint velocities to control the robot. We further introduce an asymmetric approach of training the image encoder separately from the policy using supervised learning. Experimental results demonstrate that, for a 2-DoF planar robot in a geometrically simplistic 2D environment, a learned representation of the visual scene can replace access to the explicit coordinates of the reaching target without compromising on the quality and sample efficiency of the policy. We further show the ability of the model to improve sample efficiency for a 6-DoF robot arm in a visually realistic 3D environment.

updated: Fri Mar 11 2022 15:11:54 GMT+0000 (UTC)

published: Fri Mar 11 2022 15:11:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト