Learning Transferable Reward for Query Object Localization with Policy Adaptation

Tingfeng Li; Shaobo Han; Martin Renqiang Min; Dimitris N. Metaxas

ポリシー適応によるクエリオブジェクトのローカリゼーションに対する移転可能な報酬の学習

クエリオブジェクトのローカリゼーションへの強化学習ベースのアプローチを提案します。このアプローチでは、エージェントは、小さな模範的なセットによって指定された対象のオブジェクトをローカライズするようにトレーニングされます。序数計量学習による模範的なセットを使用して定式化された転送可能な報酬信号を学習します。私たちの提案する方法は、報酬信号がすぐに利用できない新しい環境へのテスト時のポリシーの適応を可能にし、注釈付き画像に限定される微調整アプローチよりも優れています。さらに、譲渡可能な報酬により、訓練を受けたエージェントを特定のクラスから別のクラスに転用することができます。破損したMNIST、CU-Birds、およびCOCOデータセットでの実験は、私たちのアプローチの有効性を示しています。

We propose a reinforcement learning based approach to query object localization, for which an agent is trained to localize objects of interest specified by a small exemplary set. We learn a transferable reward signal formulated using the exemplary set by ordinal metric learning. Our proposed method enables test-time policy adaptation to new environments where the reward signals are not readily available, and outperforms fine-tuning approaches that are limited to annotated images. In addition, the transferable reward allows repurposing the trained agent from one specific class to another class. Experiments on corrupted MNIST, CU-Birds, and COCO datasets demonstrate the effectiveness of our approach.

updated: Tue Mar 15 2022 00:49:14 GMT+0000 (UTC)

published: Thu Feb 24 2022 22:52:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト