Few-Shot Video Object Detection

Qi Fan; Chi-Keung Tang; Yu-Wing Tai

少数ショットビデオオブジェクト検出

非常に多様でダイナミックな世界での実世界の視覚学習の課題に 3 つの貢献をする Few-Shot Video Object Detection (FSVOD) を紹介します。数ショット学習の各カテゴリ。 2) 非常に動的なターゲットビデオオブジェクトの機能表現を集約するための高品質のビデオチューブ提案を生成するための新しいチューブ提案ネットワーク (TPN)。 3) 戦略的に改善された時間マッチングネットワーク (TMN+) は、代表的なクエリチューブの特徴をより優れた識別能力と照合し、より高い多様性を実現します。当社の TPN と TMN+ は、共同でエンドツーエンドのトレーニングを受けています。広範な実験により、画像ベースの方法や他の単純なビデオベースの拡張機能と比較して、2 つの少数ショットのビデオオブジェクト検出データセットで、この方法が大幅に優れた検出結果を生成することが実証されています。コードとデータセットは https://github.com/fanq15/FewX でリリースされています。

We introduce Few-Shot Video Object Detection (FSVOD) with three contributions to real-world visual learning challenge in our highly diverse and dynamic world: 1) a large-scale video dataset FSVOD-500 comprising of 500 classes with class-balanced videos in each category for few-shot learning; 2) a novel Tube Proposal Network (TPN) to generate high-quality video tube proposals for aggregating feature representation for the target video object which can be highly dynamic; 3) a strategically improved Temporal Matching Network (TMN+) for matching representative query tube features with better discriminative ability thus achieving higher diversity. Our TPN and TMN+ are jointly and end-to-end trained. Extensive experiments demonstrate that our method produces significantly better detection results on two few-shot video object detection datasets compared to image-based methods and other naive video-based extensions. Codes and datasets are released at https://github.com/fanq15/FewX.

updated: Sun Aug 07 2022 09:23:11 GMT+0000 (UTC)

published: Fri Apr 30 2021 07:38:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト