Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval

Yi Xie; Huaidong Zhang; Xuemiao Xu; Jianqing Zhu; Shengfeng He

より小さな学生に向けて: 効率的な画像検索のための容量動的蒸留

前の知識 Distillation ベースの効率的な画像検索方法は、高速な推論のための学生モデルとして軽量ネットワークを採用しています。ただし、軽量の学生モデルは、最も重要な初期のトレーニング期間中に効果的な知識の模倣に十分な表現能力を欠いており、最終的なパフォーマンスの低下を引き起こします。この問題に取り組むために、編集可能な表現能力を持つ学生モデルを構築する能力動的蒸留フレームワークを提案します。具体的には、採用された学生モデルは、初期のトレーニングエポックで蒸留された知識を実りよく学ぶための最初は重いモデルであり、学生モデルはトレーニング中に徐々に圧縮されます。モデルの容量を動的に調整するために、動的フレームワークは学習可能な畳み込み層をチャネル重要度インジケーターとして生徒モデルの各残差ブロック内に挿入します。インジケータは、画像検索損失と圧縮損失によって同時に最適化され、検索ガイド付き勾配リセットメカニズムが勾配競合を解放するために提案されています。広範な実験により、私たちの方法が優れた推論速度と精度を持っていることが示されています。たとえば、VeRi-776 データセットでは、ResNet101 を教師として使用した場合、私たちの方法は 67.13% のモデルパラメーターと 65.67% の FLOP を節約します (状態よりも約 24.13% と 21.94% 高い)精度を犠牲にすることなく (最新技術よりも約 2.11% mAP 高い)。

Previous Knowledge Distillation based efficient image retrieval methods employs a lightweight network as the student model for fast inference. However, the lightweight student model lacks adequate representation capacity for effective knowledge imitation during the most critical early training period, causing final performance degeneration. To tackle this issue, we propose a Capacity Dynamic Distillation framework, which constructs a student model with editable representation capacity. Specifically, the employed student model is initially a heavy model to fruitfully learn distilled knowledge in the early training epochs, and the student model is gradually compressed during the training. To dynamically adjust the model capacity, our dynamic framework inserts a learnable convolutional layer within each residual block in the student model as the channel importance indicator. The indicator is optimized simultaneously by the image retrieval loss and the compression loss, and a retrieval-guided gradient resetting mechanism is proposed to release the gradient conflict. Extensive experiments show that our method has superior inference speed and accuracy, e.g., on the VeRi-776 dataset, given the ResNet101 as a teacher, our method saves 67.13% model parameters and 65.67% FLOPs (around 24.13% and 21.94% higher than state-of-the-arts) without sacrificing accuracy (around 2.11% mAP higher than state-of-the-arts).

updated: Thu Mar 16 2023 11:09:22 GMT+0000 (UTC)

published: Thu Mar 16 2023 11:09:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト