Label-Efficient Online Continual Object Detection in Streaming Video

Jay Zhangjie Wu; David Junhao Zhang; Wynne Hsu; Mengmi Zhang; Mike Zheng Shou

ストリーミングビデオでのラベル効率の高いオンライン継続オブジェクト検出

進化する環境で繁栄するために、人間は、以前に学んだ経験を保持しながら、最小限の監督で、継続的なビデオストリームから新しい知識を継続的に取得および転送することができます。人間の学習とは対照的に、ほとんどの標準的な継続学習ベンチマークは、完全に監視された設定で静的iid画像から学習することに焦点を当てています。ここでは、ビデオストリームでのより現実的で挑戦的なproblemx2014Label-Efficient Online Continual Object Detection（LEOCOD）を調べます。この問題に対処することで、アノテーションコストと再トレーニング時間を削減し、多くの実際のアプリケーションに大きなメリットをもたらします。この問題に取り組むために、人間の脳の補完学習システム（CLS）からインスピレーションを得て、Efficient-CLSと呼ばれる計算モデルを提案します。 Efficient-CLSは、CLSの海馬および新皮質と機能的に相関しており、シナプス荷重の伝達とパターンの再生を介した、速い学習者と遅い学習者の間の双方向の相互作用を含むメモリエンコーディングメカニズムを想定しています。 2つの挑戦的な実世界のビデオストリームデータセットでEfficient-CLSと競合ベースラインをテストします。人間のように、Efficient-CLSは、忘却を最小限に抑えながら、繰り返されないビデオの連続的な時間ストリームから新しいオブジェクトクラスを段階的に検出することを学習します。注目すべきことに、注釈付きのビデオフレームが25％しかないため、Efficient-CLSは、すべてのビデオフレームで100％の注釈付きでトレーニングされたすべての比較モデルの中で依然としてリードしています。データとソースコードはhttps://github.com/showlab/Efficient-CLSで公開されます。

To thrive in evolving environments, humans are capable of continual acquisition and transfer of new knowledge, from a continuous video stream, with minimal supervisions, while retaining previously learnt experiences. In contrast to human learning, most standard continual learning benchmarks focus on learning from static iid images in fully supervised settings. Here, we examine a more realistic and challenging problemx2014Label-Efficient Online Continual Object Detection (LEOCOD) in video streams. By addressing this problem, it would greatly benefit many real-world applications with reduced annotation costs and retraining time. To tackle this problem, we seek inspirations from complementary learning systems (CLS) in human brains and propose a computational model, dubbed as Efficient-CLS. Functionally correlated with the hippocampus and the neocortex in CLS, Efficient-CLS posits a memory encoding mechanism involving bidirectional interaction between fast and slow learners via synaptic weight transfers and pattern replays. We test Efficient-CLS and competitive baselines in two challenging real-world video stream datasets. Like humans, Efficient-CLS learns to detect new object classes incrementally from a continuous temporal stream of non-repeating video with minimal forgetting. Remarkably, with only 25% annotated video frames, our Efficient-CLS still leads among all comparative models, which are trained with 100% annotations on all video frames. The data and source code will be publicly available at https://github.com/showlab/Efficient-CLS.

updated: Wed Jun 01 2022 08:22:34 GMT+0000 (UTC)

published: Wed Jun 01 2022 08:22:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト