Attention over learned object embeddings enables complex visual reasoning

David Ding; Felix Hill; Adam Santoro; Malcolm Reynolds; Matt Botvinick

学習したオブジェクトの埋め込みに注意を払うことで、複雑な視覚的推論が可能になります

ニューラルネットワークは、さまざまな知覚タスクで成功を収めていますが、知覚と高レベルの推論の両方を含むタスクでは失敗することがよくあります。これらのより困難なタスクでは、その特定のタイプのタスクを対象とした特注のアプローチ（モジュラーシンボリックコンポーネント、独立したダイナミクスモデル、セマンティックパーサーなど）は、通常、パフォーマンスが向上しています。ただし、これらのターゲットを絞ったアプローチの欠点は、汎用ニューラルネットワークよりも脆弱である可能性があり、目前の特定のタスクに応じて大幅な変更や再設計が必要になることです。ここでは、動的視覚推論問題へのより一般的なニューラルネットワークベースのアプローチを提案します。これにより、3つの異なるドメインで最先端のパフォーマンスが得られます。いずれの場合も、タスクに合わせて特別に調整された特注のモジュラーアプローチよりも優れています。私たちの方法は、学習されたオブジェクト中心の表現、自己注意および自己教師ありダイナミクス学習に依存しており、強力なパフォーマンスを発揮するには、3つの要素すべてが一緒に必要です。この組み合わせの成功は、時空間的または因果的スタイルの推論を含む問題のパフォーマンスと柔軟性をトレードオフする必要がない可能性があることを示唆しています。ニューラルネットワークの適切なソフトバイアスと学習目標により、両方の長所を達成できる可能性があります。

Neural networks have achieved success in a wide array of perceptual tasks but often fail at tasks involving both perception and higher-level reasoning. On these more challenging tasks, bespoke approaches (such as modular symbolic components, independent dynamics models or semantic parsers) targeted towards that specific type of task have typically performed better. The downside to these targeted approaches, however, is that they can be more brittle than general-purpose neural networks, requiring significant modification or even redesign according to the particular task at hand. Here, we propose a more general neural-network-based approach to dynamic visual reasoning problems that obtains state-of-the-art performance on three different domains, in each case outperforming bespoke modular approaches tailored specifically to the task. Our method relies on learned object-centric representations, self-attention and self-supervised dynamics learning, and all three elements together are required for strong performance to emerge. The success of this combination suggests that there may be no need to trade off flexibility for performance on problems involving spatio-temporal or causal-style reasoning. With the right soft biases and learning objectives in a neural network we may be able to attain the best of both worlds.

updated: Tue Jul 06 2021 17:58:42 GMT+0000 (UTC)

published: Tue Dec 15 2020 18:57:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト