Learning Active Camera for Multi-Object Navigation

Peihao Chen; Dongyu Ji; Kunyang Lin; Weiwen Hu; Wenbing Huang; Thomas H. Li; Mingkui Tan; Chuang Gan

マルチオブジェクトナビゲーションのためのアクティブカメラの学習

ロボットが複数のオブジェクトに自律的にナビゲートできるようにすることは、ロボットアプリケーションでは不可欠ですが、困難です。主な課題の 1 つは、カメラセンサーのみを使用して環境を効率的に探索する方法です。既存のナビゲーション方法は主に固定カメラに焦点を当てており、アクティブなカメラでナビゲートする試みはほとんど行われていません。その結果、カメラのスコープが限られているため、エージェントが環境を認識するのに非常に長い時間がかかる場合があります。対照的に、人間は通常、周囲を見回して環境をよりよく認識することにより、より広い視野を獲得します。ロボットが人間と同じくらい効率的に環境を認識できるようにする方法は、ロボット工学の基本的な問題です。このホワイトペーパーでは、アクティブカメラを使用して複数のオブジェクトをより効率的にナビゲートすることを検討します。具体的には、移動カメラをマルコフ決定プロセスにキャストし、アクティブカメラの問題を強化学習の問題として再定式化します。ただし、2 つの新しい課題に対処する必要があります。1) 複雑な環境で適切なカメラポリシーを学習する方法と、2) それをナビゲーションポリシーと調整する方法です。これらに対処するために、カメラを積極的に動かしてエージェントがより多くのエリアを探索するように奨励する報酬関数を慎重に設計します。さらに、人間の経験を活用して、ルールベースのカメラアクションを推測し、学習プロセスをガイドします。最後に、2 種類のポリシーをより適切に調整するために、カメラポリシーは、カメラの移動を決定する際にナビゲーションアクションを考慮に入れます。実験結果は、カメラポリシーが 2 つのデータセットの 4 つのベースラインでマルチオブジェクトナビゲーションのパフォーマンスを一貫して改善することを示しています。

Getting robots to navigate to multiple objects autonomously is essential yet difficult in robot applications. One of the key challenges is how to explore environments efficiently with camera sensors only. Existing navigation methods mainly focus on fixed cameras and few attempts have been made to navigate with active cameras. As a result, the agent may take a very long time to perceive the environment due to limited camera scope. In contrast, humans typically gain a larger field of view by looking around for a better perception of the environment. How to make robots perceive the environment as efficiently as humans is a fundamental problem in robotics. In this paper, we consider navigating to multiple objects more efficiently with active cameras. Specifically, we cast moving camera to a Markov Decision Process and reformulate the active camera problem as a reinforcement learning problem. However, we have to address two new challenges: 1) how to learn a good camera policy in complex environments and 2) how to coordinate it with the navigation policy. To address these, we carefully design a reward function to encourage the agent to explore more areas by moving camera actively. Moreover, we exploit human experience to infer a rule-based camera action to guide the learning process. Last, to better coordinate two kinds of policies, the camera policy takes navigation actions into account when making camera moving decisions. Experimental results show our camera policy consistently improves the performance of multi-object navigation over four baselines on two datasets.

updated: Fri Oct 14 2022 04:17:30 GMT+0000 (UTC)

published: Fri Oct 14 2022 04:17:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト