Transfer of Representations to Video Label Propagation: Implementation Factors Matter

Daniel McKee; Zitong Zhan; Bing Shuai; Davide Modolo; Joseph Tighe; Svetlana Lazebnik

ビデオラベルの伝播への表現の転送：実装要因が重要

この作品は、色付けや時間サイクルの一貫性などの自己監視信号を使用してビデオの対応を学習する最近提案された方法に焦点を当てて、ビデオの高密度ラベル伝播の機能表現を研究します。文献では、これらの方法は一貫性のない一連の設定で評価されているため、傾向を識別したり、パフォーマンスを公正に比較したりすることは困難です。ほとんどの既存のバリエーションを含むラベル伝播アルゴリズムの統一された定式化から始めて、特徴抽出とラベル伝播における重要な実装要因の影響を体系的に研究します。その過程で、適切に調整された教師ありおよび教師なし静止画像のベースラインの精度を報告します。これは、以前の作業で見つかったものよりも高くなっています。また、ビデオベースの対応キューを静止画像ベースの対応キューで拡張すると、パフォーマンスがさらに向上することも示しています。次に、DAVISベンチマークで最近のビデオベースの方法を公正に比較し、さまざまな特殊なビデオベースの損失とトレーニングの詳細を使用しているにもかかわらず、強力なImageNetベースラインに近いパフォーマンスレベルへの最良の方法の収束を示します。 JHMDBとVIPデータセットの追加の比較により、現在のメソッドの同様のパフォーマンスが確認されます。この研究が評価の実践を改善し、時間的対応において将来の研究の方向性をよりよく知らせるのに役立つことを願っています。

This work studies feature representations for dense label propagation in video, with a focus on recently proposed methods that learn video correspondence using self-supervised signals such as colorization or temporal cycle consistency. In the literature, these methods have been evaluated with an array of inconsistent settings, making it difficult to discern trends or compare performance fairly. Starting with a unified formulation of the label propagation algorithm that encompasses most existing variations, we systematically study the impact of important implementation factors in feature extraction and label propagation. Along the way, we report the accuracies of properly tuned supervised and unsupervised still image baselines, which are higher than those found in previous works. We also demonstrate that augmenting video-based correspondence cues with still-image-based ones can further improve performance. We then attempt a fair comparison of recent video-based methods on the DAVIS benchmark, showing convergence of best methods to performance levels near our strong ImageNet baseline, despite the usage of a variety of specialized video-based losses and training particulars. Additional comparisons on JHMDB and VIP datasets confirm the similar performance of current methods. We hope that this study will help to improve evaluation practices and better inform future research directions in temporal correspondence.

updated: Thu Mar 10 2022 18:58:22 GMT+0000 (UTC)

published: Thu Mar 10 2022 18:58:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト