DnS: Distill-and-Select for Efficient and Accurate Video Indexing and Retrieval

Giorgos Kordopatis-Zilos; Christos Tzelepis; Symeon Papadopoulos; Ioannis Kompatsiaris; Ioannis Patras

DnS：効率的で正確なビデオのインデックス作成と検索のための蒸留と選択

この論文では、大規模なデータセットにおける高性能で計算効率の高いコンテンツベースのビデオ検索の問題に取り組んでいます。現在の方法は、通常、次のいずれかを提案します。（i）時空間表現と類似性計算を使用して、高い計算コストで高いパフォーマンスを実現するきめ細かいアプローチ、または（ii）ビデオをグローバルベクトルとして表現/インデックス付けする粗いアプローチ。時間的構造が失われ、パフォーマンスは低下しますが、計算コストも低くなります。この作業では、Distill-and-Select（DnS）と呼ばれる知識蒸留フレームワークを提案します。これは、パフォーマンスの高いきめ細かい教師ネットワークから開始して、次のことを学習します。a）さまざまな検索パフォーマンスと計算効率のトレードオフでの学生ネットワークとb）高い検索パフォーマンスと高い計算効率の両方を維持するために、テスト時にサンプルを適切な学生に迅速に転送するセレクターネットワーク。アーキテクチャが異なる数人の学生をトレーニングし、パフォーマンスと効率のさまざまなトレードオフ、つまり、バイナリ表現を使用してビデオを保存/インデックス付けするきめ細かい学生を含む、速度とストレージの要件に到達します。重要なのは、提案されたスキームにより、ラベルのない大規模なデータセットでの知識の蒸留が可能になることです。これにより、優秀な学生が生まれます。 3つの異なるビデオ検索タスクに関する5つの公開データセットでDnSを評価し、a）学生がいくつかのケースで最先端のパフォーマンスを達成し、b）DnSフレームワークが検索パフォーマンスと計算の間の優れたトレードオフを提供することを示します速度、およびストレージスペース。特定の構成では、提案された方法は教師と同様のmAPを実現しますが、20倍高速で、必要なストレージスペースは240分の1です。収集されたデータセットと実装は公開されています：https：//github.com/mever-team/distill-and-select。

In this paper, we address the problem of high performance and computationally efficient content-based video retrieval in large-scale datasets. Current methods typically propose either: (i) fine-grained approaches employing spatio-temporal representations and similarity calculations, achieving high performance at a high computational cost or (ii) coarse-grained approaches representing/indexing videos as global vectors, where the spatio-temporal structure is lost, providing low performance but also having low computational cost. In this work, we propose a Knowledge Distillation framework, called Distill-and-Select (DnS), that starting from a well-performing fine-grained Teacher Network learns: a) Student Networks at different retrieval performance and computational efficiency trade-offs and b) a Selector Network that at test time rapidly directs samples to the appropriate student to maintain both high retrieval performance and high computational efficiency. We train several students with different architectures and arrive at different trade-offs of performance and efficiency, i.e., speed and storage requirements, including fine-grained students that store/index videos using binary representations. Importantly, the proposed scheme allows Knowledge Distillation in large, unlabelled datasets -- this leads to good students. We evaluate DnS on five public datasets on three different video retrieval tasks and demonstrate a) that our students achieve state-of-the-art performance in several cases and b) that the DnS framework provides an excellent trade-off between retrieval performance, computational speed, and storage space. In specific configurations, the proposed method achieves similar mAP with the teacher but is 20 times faster and requires 240 times less storage space. The collected dataset and implementation are publicly available: https://github.com/mever-team/distill-and-select.

updated: Fri Aug 05 2022 11:53:45 GMT+0000 (UTC)

published: Thu Jun 24 2021 18:34:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト