Compressing Models with Few Samples: Mimicking then Replacing

Huanyu Wang; Junjie Liu; Xin Ma; Yang Yong; Zhenhua Chai; Jianxin Wu

少数のサンプルでモデルを圧縮する：模倣してから置き換える

少数のサンプルの圧縮は、大きな冗長モデルを、サンプルが少ない小さなコンパクトなモデルに圧縮することを目的としています。これらの限られた少数のサンプルを使用してモデルを直接微調整すると、モデルは過剰適合に対して脆弱になり、ほとんど何も学習しません。したがって、以前の方法では、圧縮モデルをレイヤーごとに最適化し、すべてのレイヤーが教師モデルの対応するレイヤーと同じ出力を持つようにします。これは面倒です。この論文では、少数サンプル圧縮のためのMimicking then Replacing（MiR）という名前の新しいフレームワークを提案します。これは、最初に剪定されたモデルに、最後から2番目のレイヤーの教師と同じ機能を出力するように促し、最後から2番目のレイヤーで教師のレイヤーを置き換えます。よく調整されたコンパクトなもの。以前のレイヤーワイズ再構築方法とは異なり、MiRはネットワーク全体を全体的に最適化します。これは、シンプルで効果的であるだけでなく、教師なしで一般的でもあります。 MiRは、マージンが大きく、以前の方法よりも優れています。コードはまもなく利用可能になります。

Few-sample compression aims to compress a big redundant model into a small compact one with only few samples. If we fine-tune models with these limited few samples directly, models will be vulnerable to overfit and learn almost nothing. Hence, previous methods optimize the compressed model layer-by-layer and try to make every layer have the same outputs as the corresponding layer in the teacher model, which is cumbersome. In this paper, we propose a new framework named Mimicking then Replacing (MiR) for few-sample compression, which firstly urges the pruned model to output the same features as the teacher's in the penultimate layer, and then replaces teacher's layers before penultimate with a well-tuned compact one. Unlike previous layer-wise reconstruction methods, our MiR optimizes the entire network holistically, which is not only simple and effective, but also unsupervised and general. MiR outperforms previous methods with large margins. Codes will be available soon.

updated: Fri Jan 07 2022 07:03:48 GMT+0000 (UTC)

published: Fri Jan 07 2022 07:03:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト