End-to-end Recurrent Multi-Object Tracking and Trajectory Prediction with Relational Reasoning

Fabian B. Fuchs; Adam R. Kosiorek; Li Sun; Oiwi Parker Jones; Ingmar Posner

リレーショナル推論によるエンドツーエンドの反復マルチオブジェクトトラッキングと軌道予測

現代のオブジェクト追跡アプローチの大部分は、オブジェクト間の相互作用をモデル化していません。これは、オブジェクトの経路が独立していないという事実とは対照的です。サイクリストは、車との衝突を避けるために、以前に計画された軌道から突然逸脱する可能性があります。ニューラルクラスにとらわれない単一オブジェクトトラッカーであるHARTに基づいて、リレーショナル推論が可能なマルチオブジェクトトラッキングメソッドMOHARTを紹介します。重要なのは、オブジェクト間の相互作用と関係の理解を含むシステム全体がクラスにとらわれず、エンドツーエンドの方法で同時に学習されることです。いくつかのリレーショナル推論アーキテクチャを調査し、順列不変モデルが非順列不変モデルよりも優れていることを示します。また、DeepSetsのような単一の順列不変演算を使用するアーキテクチャは、理論的には普遍関数近似器であるにもかかわらず、多面的な注意に基づくより複雑なアーキテクチャよりも優れていることがわかります。後者は、挑戦的なおもちゃの実験における複雑な物理的相互作用をよりよく説明します。さらに、相互作用のモデリングにより、特にエゴモーション、オクルージョン、混雑したシーン、および障害のあるセンサー入力。

The majority of contemporary object-tracking approaches do not model interactions between objects. This contrasts with the fact that objects' paths are not independent: a cyclist might abruptly deviate from a previously planned trajectory in order to avoid colliding with a car. Building upon HART, a neural class-agnostic single-object tracker, we introduce a multi-object tracking method MOHART capable of relational reasoning. Importantly, the entire system, including the understanding of interactions and relations between objects, is class-agnostic and learned simultaneously in an end-to-end fashion. We explore a number of relational reasoning architectures and show that permutation-invariant models outperform non-permutation-invariant alternatives. We also find that architectures using a single permutation invariant operation like DeepSets, despite, in theory, being universal function approximators, are nonetheless outperformed by a more complex architecture based on multi-headed attention. The latter better accounts for complex physical interactions in a challenging toy experiment. Further, we find that modelling interactions leads to consistent performance gains in tracking as well as future trajectory prediction on three real-world datasets (MOTChallenge, UA-DETRAC, and Stanford Drone dataset), particularly in the presence of ego-motion, occlusions, crowded scenes, and faulty sensor inputs.

updated: Mon Sep 28 2020 14:25:23 GMT+0000 (UTC)

published: Fri Jul 12 2019 22:40:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト