MOD: A Deep Mixture Model with Online Knowledge Distillation for Large   Scale Video Temporal Concept Localization

Rongcheng Lin; Jing Xiao; Jianping Fan

MOD：大規模なビデオ時間的概念ローカリゼーションのためのオンライン知識蒸留を備えたディープ混合モデル

MOD: A Deep Mixture Model with Online Knowledge Distillation for Large Scale Video Temporal Concept Localization

この論文では、大規模なビデオ時間概念ローカリゼーションのためのオンライン知識蒸留（MOD）を使用したディープ混合モデルを提示および議論します。これは、第3回YouTube-8Mビデオ理解チャレンジで第3位にランクされています。具体的には、オンライン蒸留で知識を共有できるようにすることで、小さなデータセットで混合モデルを調整することで、評価パフォーマンスが向上することがわかります。この観察に基づいて、最終ソリューションでは、2層のオンライン蒸留構造と並行して、12のNeXtVLADモデルをトレーニングおよび調整しました。実験結果は、提案された蒸留構造が過剰適合を効果的に回避でき、優れた一般化性能を示すことを示しています。コードはhttps://github.com/linrongc/solution_youtube8m_v3で公開されています

In this paper, we present and discuss a deep mixture model with online knowledge distillation (MOD) for large-scale video temporal concept localization, which is ranked 3rd in the 3rd YouTube-8M Video Understanding Challenge. Specifically, we find that by enabling knowledge sharing with online distillation, fintuning a mixture model on a smaller dataset can achieve better evaluation performance. Based on this observation, in our final solution, we trained and fintuned 12 NeXtVLAD models in parallel with a 2-layer online distillation structure. The experimental results show that the proposed distillation structure can effectively avoid overfitting and shows superior generalization performance. The code is publicly available at: https://github.com/linrongc/solution_youtube8m_v3

updated: Sun Oct 27 2019 16:24:42 GMT+0000 (UTC)

published: Sun Oct 27 2019 16:24:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト