Few-Shot Video Object Detection

Qi Fan; Chi-Keung Tang; Yu-Wing Tai

少数ショットのビデオオブジェクト検出

非常に多様でダイナミックな世界での視覚学習への3つの貢献を伴うFew-ShotVideo Object Detection（FSVOD）を紹介します。1）500クラスで構成される大規模なビデオデータセットFSVOD-500。 -ショット学習; 2）非常に動的である可能性があるターゲットビデオオブジェクトの特徴表現を集約するための高品質のビデオチューブ提案を生成するための新しいチューブ提案ネットワーク（TPN）。 3）戦略的に改善された時間的マッチングネットワーク（TMN +）により、代表的なクエリチューブの特徴をより優れた識別能力でマッチングし、より高い多様性を実現します。当社のTPNとTMN +は、共同でエンドツーエンドでトレーニングされています。広範な実験は、私たちの方法が、画像ベースの方法や他の素朴なビデオベースの拡張と比較して、2つの数ショットのビデオオブジェクト検出データセットで大幅に優れた検出結果を生成することを示しています。コードとデータセットはhttps://github.com/fanq15/FewXでリリースされます。

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to visual learning in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive video-based extensions. Codes and datasets will be released at https://github.com/fanq15/FewX.

updated: Mon Nov 22 2021 08:33:25 GMT+0000 (UTC)

published: Fri Apr 30 2021 07:38:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト