End-to-End Learning of Keypoint Representations for Continuous Control from Images

Rinu Boney; Alexander Ilin; Juho Kannala

画像からの連続制御のためのキーポイント表現のエンドツーエンド学習

視覚を含む多くの制御問題では、シーン内のオブジェクトの位置から最適な制御を推測できます。この情報は、入力画像内の空間位置のリストであるキーポイントを使用して表すことができます。以前の作品は、エンコーダ-デコーダアーキテクチャを使用した教師なし事前トレーニング中に学習されたキーポイント表現が、制御タスクに優れた機能を提供できることを示しています。このホワイトペーパーでは、教師なし事前トレーニング、デコーダー、または追加の損失を必要とせずに、効率的なキーポイント表現をエンドツーエンドで学習できることを示します。私たちが提案するアーキテクチャは、推定されたキーポイントの座標をソフトアクター批評家に直接供給する微分可能なキーポイントエクストラクタで構成されています。提案されたアルゴリズムは、DeepMind ControlSuiteタスクの最先端に匹敵するパフォーマンスをもたらします。

In many control problems that include vision, optimal controls can be inferred from the location of the objects in the scene. This information can be represented using keypoints, which is a list of spatial locations in the input image. Previous works show that keypoint representations learned during unsupervised pre-training using encoder-decoder architectures can provide good features for control tasks. In this paper, we show that it is possible to learn efficient keypoint representations end-to-end, without the need for unsupervised pre-training, decoders, or additional losses. Our proposed architecture consists of a differentiable keypoint extractor that feeds the coordinates of the estimated keypoints directly to a soft actor-critic agent. The proposed algorithm yields performance competitive to the state-of-the art on DeepMind Control Suite tasks.

updated: Tue Jun 15 2021 09:17:06 GMT+0000 (UTC)

published: Tue Jun 15 2021 09:17:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト