TransCenter: Transformers with Dense Representations for Multiple-Object Tracking

Yihong Xu; Yutong Ban; Guillaume Delorme; Chuang Gan; Daniela Rus; Xavier Alameda-Pineda

TransCenter: 複数のオブジェクトを追跡するための高密度表現を備えたトランスフォーマー

変圧器は、導入されて以来、さまざまなタスクに対して優れた性能を発揮してきました。近年、画像分類や物体検出などのタスクでビジョンコミュニティから注目を集めています。この波にもかかわらず、トランスフォーマーに基づく正確で効率的な複数オブジェクト追跡 (MOT) メソッドはまだ設計されていません。 2 次の複雑さとノイズ初期化スパースクエリが不十分なトランスフォーマーアーキテクチャの直接適用は、MOT には最適ではないと主張します。 TransCenter は、適切なランタイムを維持しながらすべてのオブジェクトを正確に追跡するための高密度表現を備えたトランスフォーマーベースの MOT アーキテクチャです。方法論的には、画像関連の高密度検出クエリと、慎重に設計されたクエリ学習ネットワーク (QLN) によって生成される効率的なスパース追跡クエリの使用を提案します。一方では、高密度の画像関連の検出クエリにより、高密度のヒートマップ出力を通じてターゲットの位置をグローバルかつ確実に推測できます。一方、スパース追跡クエリのセットは、TransCenter Decoder の画像特徴と効率的にやり取りして、時間の経過とともにオブジェクトの位置を関連付けます。その結果、TransCenter は顕著なパフォーマンスの向上を示し、2 つのトラッキング設定 (パブリック/プライベート) を使用した 2 つの標準 MOT ベンチマークで現在の最先端の方法を大幅に上回っています。 TransCenter はまた、大規模なアブレーション研究と、より単純な代替および同時作業との比較によって、効率的で正確であることが証明されています。科学的関心のために、コードは https://github.com/yihongxu/transcenter で公開されています。

Transformers have proven superior performance for a wide variety of tasks since they were introduced. In recent years, they have drawn attention from the vision community in tasks such as image classification and object detection. Despite this wave, an accurate and efficient multiple-object tracking (MOT) method based on transformers is yet to be designed. We argue that the direct application of a transformer architecture with quadratic complexity and insufficient noise-initialized sparse queries - is not optimal for MOT. We propose TransCenter, a transformer-based MOT architecture with dense representations for accurately tracking all the objects while keeping a reasonable runtime. Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN). On one hand, the dense image-related detection queries allow us to infer targets' locations globally and robustly through dense heatmap outputs. On the other hand, the set of sparse tracking queries efficiently interacts with image features in our TransCenter Decoder to associate object positions through time. As a result, TransCenter exhibits remarkable performance improvements and outperforms by a large margin the current state-of-the-art methods in two standard MOT benchmarks with two tracking settings (public/private). TransCenter is also proven efficient and accurate by an extensive ablation study and comparisons to more naive alternatives and concurrent works. For scientific interest, the code is made publicly available at https://github.com/yihongxu/transcenter.

updated: Fri Sep 30 2022 10:00:00 GMT+0000 (UTC)

published: Sun Mar 28 2021 14:49:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト