Synthetic Data Are as Good as the Real for Association Knowledge Learning in Multi-object Tracking

Yuchi Liu; Zhongdao Wang; Xiangxin Zhou; Liang Zheng

合成データは、マルチオブジェクト追跡における関連知識学習の実際と同じくらい優れています

ビデオシーケンスで同じIDの境界ボックスをリンクすることを目的とした関連付けは、マルチオブジェクトトラッキング（MOT）の中心的なコンポーネントです。パラメトリックネットワークなどのアソシエーションモジュールをトレーニングするには、通常、実際のビデオデータが使用されます。ただし、連続するビデオフレームで人物のトラックに注釈を付けるにはコストがかかり、そのような実際のデータは柔軟性がないため、追跡シナリオを変更してシステムパフォーマンスを評価する機会が限られています。この論文では、3D合成データがアソシエーショントレーニングの実際のビデオに取って代わることができるかどうかを研究します。具体的には、MOTXという大規模な合成データエンジンを導入します。このエンジンでは、カメラとオブジェクトのモーション特性が、実際のデータセットと同様に手動で構成されます。実際のデータと比較して、合成データから得られた関連知識は、ドメイン適応技術なしで実際のテストセットで非常に類似したパフォーマンスを達成できることを示します。私たちの興味深い観察は、2つの要因によるものです。何よりもまず、3Dエンジンは、カメラの動き、カメラビュー、オブジェクトの動きなどのモーションファクターを適切にシミュレートできるため、シミュレートされたビデオは、関連付けモジュールに効果的なモーション機能を提供できます。第二に、実験結果は、出現領域のギャップが関連知識の学習にほとんど害を及ぼさないことを示しています。さらに、MOTXの強力なカスタマイズ機能により、モーションファクターがMOTに与える影響を定量的に評価できるため、コミュニティに新しい洞察がもたらされます。

Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those in real-world datasets. We show that compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view and object movement, so that the simulated videos can provide association modules with effective motion features. Second, experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.

updated: Mon Oct 25 2021 04:53:42 GMT+0000 (UTC)

published: Wed Jun 30 2021 14:46:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト