Global-Local Self-Distillation for Visual Representation Learning

Tim Lebailly; Tinne Tuytelaars

視覚表現学習のためのグローバルローカル自己蒸留

自己教師ありメソッドの下流の精度は、トレーニング中に解決されたプロキシタスクとそこから抽出された勾配の品質に密接に関連しています。よりリッチで意味のある勾配の更新は、自己教師ありメソッドがより適切かつ効率的な方法で学習できるようにするための鍵です。典型的な自己蒸留フレームワークでは、拡張された 2 つの画像の表現がグローバルレベルで一貫性を持つように強制されます。それにもかかわらず、プロキシタスクにローカルキューを組み込むことは有益であり、ダウンストリームタスクのモデル精度を向上させることができます。これは、一方ではグローバル表現間の一貫性が強化され、他方ではローカル表現間の一貫性が強化されるという二重の目的につながります。残念ながら、ローカル表現の 2 つのセット間の正確な対応マッピングは存在せず、ローカル表現をある拡張から別の拡張に一致させるタスクは自明ではありません。入力画像の空間情報を活用して幾何学的マッチングを取得し、この幾何学的アプローチを類似性マッチングに基づく以前の方法と比較することを提案します。私たちの研究は、1) 幾何学的マッチングが低データ体制での類似性ベースのマッチングよりも優れていることだけでなく、2) 類似性ベースのマッチングは、ローカル自己蒸留なしのバニラベースラインと比較して、低データ体制で非常に有害であることを示しています.コードは https://github.com/tileb1/global-local-self-distillation で入手できます。

The downstream accuracy of self-supervised methods is tightly linked to the proxy task solved during training and the quality of the gradients extracted from it. Richer and more meaningful gradients updates are key to allow self-supervised methods to learn better and in a more efficient manner. In a typical self-distillation framework, the representation of two augmented images are enforced to be coherent at the global level. Nonetheless, incorporating local cues in the proxy task can be beneficial and improve the model accuracy on downstream tasks. This leads to a dual objective in which, on the one hand, coherence between global-representations is enforced and on the other, coherence between local-representations is enforced. Unfortunately, an exact correspondence mapping between two sets of local-representations does not exist making the task of matching local-representations from one augmentation to another non-trivial. We propose to leverage the spatial information in the input images to obtain geometric matchings and compare this geometric approach against previous methods based on similarity matchings. Our study shows that not only 1) geometric matchings perform better than similarity based matchings in low-data regimes but also 2) that similarity based matchings are highly hurtful in low-data regimes compared to the vanilla baseline without local self-distillation. The code is available at https://github.com/tileb1/global-local-self-distillation.

updated: Wed Oct 12 2022 21:12:54 GMT+0000 (UTC)

published: Fri Jul 29 2022 13:50:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト