Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Jinyu Chen; Wenguan Wang; Si Liu; Hongsheng Li; Yi Yang

知識伝達ベースの視聴覚ナビゲーションのための全方位情報収集

視聴覚ナビゲーションは、ロボットエージェントが音源に向かってこれまで見たことのない 3D 環境を移動することを要求される、音声をターゲットとしたウェイファインディングタスクです。この記事では、クロスタスクナビゲーションスキル伝達に基づいた全方向オーディオビジュアルナビゲーターである ORAN を紹介します。特に、ORAN は、このような困難なタスクに対応するための 2 つの基本的な能力、つまり道案内と視聴覚情報収集を強化します。まず、ORAN は、信頼性を意識したクロスタスクポリシー抽出 (CCPD) 戦略を使用してトレーニングされます。 CCPD は、大規模な PointGoal タスクで十分に訓練された基本的なポイントツーポイントのウェイファインディングスキルを ORAN に移転し、ORAN がはるかに少ないトレーニングサンプルで視聴覚ナビゲーションをより適切に習得できるようにします。知識伝達の効率を向上させ、領域ギャップに対処するために、CCPD は教師のポリシーの決定の信頼度に適応するように作られています。第 2 に、ORAN には全方向情報収集 (OIG) メカニズムが装備されています。つまり、意思決定の前にさまざまな方向から視覚音響観察を収集します。その結果、ORAN はより堅牢なナビゲーション動作を実現します。 CCPD と OIG を組み合わせると、ORAN は以前の競合他社を大幅に上回ります。モデルアンサンブルの後、Soundspaces Challenge 2022 で 1 位を獲得し、SPL と SR を相対的に 53% と 35% 改善しました。

Audio-visual navigation is an audio-targeted wayfinding task where a robot agent is entailed to travel a never-before-seen 3D environment towards the sounding source. In this article, we present ORAN, an omnidirectional audio-visual navigator based on cross-task navigation skill transfer. In particular, ORAN sharpens its two basic abilities for a such challenging task, namely wayfinding and audio-visual information gathering. First, ORAN is trained with a confidence-aware cross-task policy distillation (CCPD) strategy. CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples. To improve the efficiency of knowledge transfer and address the domain gap, CCPD is made to be adaptive to the decision confidence of the teacher policy. Second, ORAN is equipped with an omnidirectional information gathering (OIG) mechanism, i.e., gleaning visual-acoustic observations from different directions before decision-making. As a result, ORAN yields more robust navigation behaviour. Taking CCPD and OIG together, ORAN significantly outperforms previous competitors. After the model ensemble, we got 1st in Soundspaces Challenge 2022, improving SPL and SR by 53% and 35% relatively.

updated: Sun Aug 20 2023 16:03:54 GMT+0000 (UTC)

published: Sun Aug 20 2023 16:03:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト