Transform2Act: Learning a Transform-and-Control Policy for Efficient Agent Design

Ye Yuan; Yuda Song; Zhengyi Luo; Wen Sun; Kris Kitani

Transform2Act：効率的なエージェント設計のための変換および制御ポリシーの学習

エージェントの機能は、主にその設計、つまり骨格構造と関節の属性（長さ、サイズ、強度など）によって決まります。ただし、問題は本質的に組み合わせであり、設計スペースが非常に大きいため、特定の機能に最適なエージェント設計を見つけることは非常に困難です。さらに、最適なコントローラーを解く必要がある各候補設計を評価するのはコストがかかる可能性があります。これらの問題に取り組むための私たちの重要なアイデアは、エージェントの設計手順を意思決定プロセスに組み込むことです。具体的には、エピソードで、最初に一連の変換アクションを適用してエージェントの骨格構造と関節属性を変更し、次に新しい設計で制御アクションを適用するという条件付きポリシーを学習します。デザイン間で可変数のジョイントを処理するために、各グラフノードがジョイントを表し、隣接するメッセージパッシングを使用してジョイント固有のアクションを出力するグラフベースのポリシーを使用します。ポリシー勾配法を使用して、私たちのアプローチは、エージェントの設計と制御の共同最適化、および異なる設計間での経験の共有を可能にし、サンプルの効率を大幅に向上させます。実験によると、私たちのアプローチであるTransform2Actは、収束速度と最終的なパフォーマンスの点で、以前の方法よりも大幅に優れています。特に、Transform2Actは、キリン、イカ、クモに似たもっともらしいデザインを自動的に検出できます。コードとビデオはhttps://sites.google.com/view/transform2actで入手できます。

An agent's functionality is largely determined by its design, i.e., skeletal structure and joint attributes (e.g., length, size, strength). However, finding the optimal agent design for a given function is extremely challenging since the problem is inherently combinatorial and the design space is prohibitively large. Additionally, it can be costly to evaluate each candidate design which requires solving for its optimal controller. To tackle these problems, our key idea is to incorporate the design procedure of an agent into its decision-making process. Specifically, we learn a conditional policy that, in an episode, first applies a sequence of transform actions to modify an agent's skeletal structure and joint attributes, and then applies control actions under the new design. To handle a variable number of joints across designs, we use a graph-based policy where each graph node represents a joint and uses message passing with its neighbors to output joint-specific actions. Using policy gradient methods, our approach enables joint optimization of agent design and control as well as experience sharing across different designs, which improves sample efficiency substantially. Experiments show that our approach, Transform2Act, outperforms prior methods significantly in terms of convergence speed and final performance. Notably, Transform2Act can automatically discover plausible designs similar to giraffes, squids, and spiders. Code and videos are available at https://sites.google.com/view/transform2act.

updated: Sat Apr 09 2022 16:56:11 GMT+0000 (UTC)

published: Thu Oct 07 2021 17:51:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト