Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive Teachers

Philipp Sadler; Sherzod Hakimov; David Schlangen

はい、こちらです！サポート的な教師からのエピソード内フィードバックで表現を行動に落とし込む方法を学ぶ

将来の機械学習モデルが人間と自然に連携し対話するためには、進行中の対話の中で言語信号を認識する能力が不可欠です。この論文では、協力的な設定で与えられたエピソード内フィードバックを評価する初期研究を紹介します。タスク指向の協力的な共同活動の制御可能な例として、指示言語ゲームを使用します。教師は、よく知られた記号アルゴリズム (「増分アルゴリズム」) によって生成された指示表現を最初の指示として発声し、その後、エピソード内フィードバック (明示的に要求する必要はありません) に介入するためにフォロワーの行動を監視します。このタスクをスパース報酬を伴う強化学習問題として組み立て、ヒューリスティック教師のフォロワーポリシーを学習します。私たちの結果は、エピソード内フィードバックにより、フォロワーがシーンの複雑さの側面を一般化できるようになり、最初のステートメントのみを提供するよりも優れたパフォーマンスを発揮できることを示しています。

The ability to pick up on language signals in an ongoing interaction is crucial for future machine learning models to collaborate and interact with humans naturally. In this paper, we present an initial study that evaluates intra-episodic feedback given in a collaborative setting. We use a referential language game as a controllable example of a task-oriented collaborative joint activity. A teacher utters a referring expression generated by a well-known symbolic algorithm (the "Incremental Algorithm") as an initial instruction and then monitors the follower's actions to possibly intervene with intra-episodic feedback (which does not explicitly have to be requested). We frame this task as a reinforcement learning problem with sparse rewards and learn a follower policy for a heuristic teacher. Our results show that intra-episodic feedback allows the follower to generalize on aspects of scene complexity and performs better than providing only the initial statement.

updated: Mon May 22 2023 10:01:15 GMT+0000 (UTC)

published: Mon May 22 2023 10:01:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト