MOMA:Distill from Self-Supervised Teachers

Yuchong Yao; Nandakishor Desai; Marimuthu Palaniswami

MOMA:自己管理教師から抽出

Contrastive Learning と Masked Image Modeling は、Momentum Contrast (つまり MoCo) と Masked AutoEncoder (つまり MAE) がそれぞれ最新技術である自己教師あり表現学習で卓越したパフォーマンスを示しました。この作業では、事前にトレーニングされた MoCo と MAE から自己管理型の方法で抽出して、両方のパラダイムからの知識を連携させる MOMA を提案します。提案された MOMA フレームワークで、知識伝達の 3 つの異なるメカニズムを紹介します。 : (1) 事前に訓練された MoCo を MAE に蒸留します。 (2) 事前訓練された MAE を MoCo に抽出します。 (3) 事前訓練された MoCo と MAE をランダムに初期化された生徒に抽出します。蒸留中、教師と生徒にはそれぞれ元の入力とマスクされた入力が与えられます。学習は、教師からの正規化された表現と生徒からの投影された表現を揃えることによって可能になります。この単純な設計により、非常に高いマスク率と大幅に削減されたトレーニングエポックで効率的な計算が可能になり、蒸留ターゲットについて特別な考慮を必要としません。実験は、MOMA が既存の最先端の方法に匹敵するパフォーマンスを備えたコンパクトな学生モデルを提供し、両方の自己教師あり学習パラダイムの力を組み合わせることを示しています。これは、コンピュータービジョンのさまざまなベンチマークに対する競争力のある結果を示しています。私たちの方法が、計算効率の高い方法で大規模な事前トレーニング済みモデルからの知識を転送および適応するための洞察を提供することを願っています。

Contrastive Learning and Masked Image Modelling have demonstrated exceptional performance on self-supervised representation learning, where Momentum Contrast (i.e., MoCo) and Masked AutoEncoder (i.e., MAE) are the state-of-the-art, respectively. In this work, we propose MOMA to distill from pre-trained MoCo and MAE in a self-supervised manner to collaborate the knowledge from both paradigms. We introduce three different mechanisms of knowledge transfer in the propsoed MOMA framework. : (1) Distill pre-trained MoCo to MAE. (2) Distill pre-trained MAE to MoCo (3) Distill pre-trained MoCo and MAE to a random initialized student. During the distillation, the teacher and the student are fed with original inputs and masked inputs, respectively. The learning is enabled by aligning the normalized representations from the teacher and the projected representations from the student. This simple design leads to efficient computation with extremely high mask ratio and dramatically reduced training epochs, and does not require extra considerations on the distillation target. The experiments show MOMA delivers compact student models with comparable performance to existing state-of-the-art methods, combining the power of both self-supervised learning paradigms. It presents competitive results against different benchmarks in computer vision. We hope our method provides an insight on transferring and adapting the knowledge from large-scale pre-trained models in a computationally efficient way.

updated: Sat Feb 04 2023 04:23:52 GMT+0000 (UTC)

published: Sat Feb 04 2023 04:23:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト