Referring Multi-Object Tracking

Dongming Wu; Wencheng Han; Tiancai Wang; Xingping Dong; Xiangyu Zhang; Jianbing Shen

マルチオブジェクトトラッキングの参照

既存の参照理解タスクには、単一のテキスト参照オブジェクトの検出が含まれる傾向があります。この論文では、参照マルチオブジェクト追跡（RMOT）と呼ばれる、新しい一般的な参照理解タスクを提案します。その核となるアイデアは、言語表現をセマンティックキューとして使用して、マルチオブジェクトトラッキングの予測を導くことです。私たちの知る限りでは、ビデオで任意の数の参照オブジェクト予測を達成した最初の作品です。 RMOT を推進するために、KITTI に基づくスケーラブルな式を使用して、Refer-KITTI という名前の 1 つのベンチマークを構築します。具体的には、818 の表現を含む 18 の動画を提供し、動画の各表現には平均 10.7 のオブジェクトで注釈が付けられます。さらに、トランスフォーマーベースのアーキテクチャ TransRMOT を開発して、オンラインで新しいタスクに取り組みます。これにより、印象的な検出パフォーマンスが実現し、他のカウンターパートよりも優れています。データセットとコードは、https://github.com/wudongming97/RMOT で入手できます。

Existing referring understanding tasks tend to involve the detection of a single text-referred object. In this paper, we propose a new and general referring understanding task, termed referring multi-object tracking (RMOT). Its core idea is to employ a language expression as a semantic cue to guide the prediction of multi-object tracking. To the best of our knowledge, it is the first work to achieve an arbitrary number of referent object predictions in videos. To push forward RMOT, we construct one benchmark with scalable expressions based on KITTI, named Refer-KITTI. Specifically, it provides 18 videos with 818 expressions, and each expression in a video is annotated with an average of 10.7 objects. Further, we develop a transformer-based architecture TransRMOT to tackle the new task in an online manner, which achieves impressive detection performance and outperforms other counterparts. The dataset and code will be available at https://github.com/wudongming97/RMOT.

updated: Sat Mar 11 2023 14:17:48 GMT+0000 (UTC)

published: Mon Mar 06 2023 18:50:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト