Referential communication in heterogeneous communities of pre-trained visual deep networks

Matéo Mahaut; Francesca Franzon; Roberto Dessì; Marco Baroni

事前訓練されたビジュアルディープネットワークの異種コミュニティにおける参照通信

事前にトレーニングされた大規模な画像処理ニューラルネットワークが自動運転車やロボットなどの自律エージェントに組み込まれているため、アーキテクチャやトレーニング体制が異なるにもかかわらず、そのようなシステムが周囲の世界についてどのように相互に通信できるかという疑問が生じます。この方向への最初のステップとして、私たちは、異種混合の最先端の事前トレーニング済みビジュアルネットワークのコミュニティにおける参照通信のタスクを体系的に調査し、自己監視型の方法で共有プロトコルを開発できることを示します。候補のセットの中からターゲットオブジェクトを参照します。この共有プロトコルは、粒度が異なるこれまで見えなかったオブジェクトカテゴリについて通信するためにある程度使用することもできます。さらに、最初は既存のコミュニティの一部ではなかったビジュアルネットワークでも、コミュニティのプロトコルを驚くほど簡単に学習できます。最後に、創発プロトコルの特性を定性的および定量的に研究し、それがオブジェクトの高レベルの意味論的特徴を捕捉しているという証拠を提供します。

As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of referential communication in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects.

updated: Wed Mar 13 2024 16:04:03 GMT+0000 (UTC)

published: Sat Feb 04 2023 15:55:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト