S^3Track: Self-supervised Tracking with Soft Assignment Flow

Fatemeh Azimi; Fahim Mannan; Felix Heide

S^3Track: ソフト割り当てフローを使用した自己監視型トラッキング

この研究では、ビデオレベルの関連付けラベルを使用せずに、自己監視型複数オブジェクト追跡を研究します。複数のオブジェクトの追跡の問題を、連続するフレーム内の検出間のフレームごとの関連性を学習するものとして捉えることを提案します。この目的のために、我々はオブジェクト関連付けのための微分可能なソフトオブジェクト割り当てを提案し、微分可能なエンドツーエンドトレーニングでオブジェクト関連付けに合わせた特徴を学習できるようにします。このトレーニングアプローチを利用して、オブジェクトの特徴間のペアごとの距離に基づいてコストマトリックスを構築するために使用される、インスタンス対応のオブジェクトの特徴を学習するための外観ベースのモデルを開発します。時間データとマルチビューデータを使用してモデルをトレーニングし、オプティカルフローと視差情報を使用して関連付け疑似ラベルを取得します。特徴の対応関係を学習するためのプレテキストタスクに依存するほとんどの自己監視型追跡手法とは異なり、私たちの手法は、複雑なシナリオにおけるオブジェクト間の関連付けに対して直接最適化されています。このように、提案された方法は、ハイパーパラメーターのトレーニングに対して堅牢であり、自己教師あり方法の課題である極小値の影響を受けない再識別ベースの MOT アプローチを提供します。私たちは、KITTI、Waymo、nuScenes、Argoverse データセットで提案したモデルを評価し、他の教師なし手法よりも一貫して改善しています (nuScenes での関連付け精度が 7.8% 向上)。

In this work, we study self-supervised multiple object tracking without using any video-level association labels. We propose to cast the problem of multiple object tracking as learning the frame-wise associations between detections in consecutive frames. To this end, we propose differentiable soft object assignment for object association, making it possible to learn features tailored to object association with differentiable end-to-end training. With this training approach in hand, we develop an appearance-based model for learning instance-aware object features used to construct a cost matrix based on the pairwise distances between the object features. We train our model using temporal and multi-view data, where we obtain association pseudo-labels using optical flow and disparity information. Unlike most self-supervised tracking methods that rely on pretext tasks for learning the feature correspondences, our method is directly optimized for cross-object association in complex scenarios. As such, the proposed method offers a reidentification-based MOT approach that is robust to training hyperparameters and does not suffer from local minima, which are a challenge in self-supervised methods. We evaluate our proposed model on the KITTI, Waymo, nuScenes, and Argoverse datasets, consistently improving over other unsupervised methods (7.8% improvement in association accuracy on nuScenes).

updated: Wed May 17 2023 06:25:40 GMT+0000 (UTC)

published: Wed May 17 2023 06:25:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト