Rethinking Evaluation Protocols of Visual Representations Learned via Self-supervised Learning

Jae-Hun Lee; Doyoung Yoon; ByeongMoon Ji; Kyungyul Kim; Sangheum Hwang

自己教師あり学習によって学習した視覚的表現の評価プロトコルの再考

自己教師あり学習 (SSL) によって学習された視覚的表現の品質を評価するために、ラベル (ImageNet など) を使用した上流のデータセットでの線形プローブ (LP) (および k-NN) と、さまざまな下流のデータセットへの転移学習 (TL) が一般的に使用されます。）。既存の SSL メソッドはこれらの評価プロトコルの下で良好なパフォーマンスを示していますが、パフォーマンスは LP および TL に関連するハイパーパラメーターに非常に敏感であることがわかります。真に一般的な表現は他の視覚認識タスクに簡単に適応できるため、これは望ましくない動作であると主張します。つまり、学習した表現は LP および TL ハイパーパラメーターの設定に対して堅牢でなければなりません。この作業では、最先端の SSL 方式を使用して広範な実験を行うことにより、パフォーマンスの感度の原因を突き止めようとしています。まず、LP の入力の正規化は、ハイパーパラメーターに応じたパフォーマンスの変動を排除するために重要であることがわかりました。具体的には、線形分類器に入力を供給する前のバッチ正規化により、評価の安定性が大幅に向上し、k-NN および LP メトリックの不一致も解決されます。次に、TL については、SSL の重み減衰パラメーターが、アップストリームデータセットでの LP または k-NN 評価では識別できない学習表現の転送可能性に大きく影響することを示します。この調査結果は、現在の SSL 評価スキームの欠点に注意を喚起し、それらを再考する必要性を強調することにより、コミュニティにとって有益であると考えています。

Linear probing (LP) (and k-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensitive to the hyperparameters involved in LP and TL. We argue that this is an undesirable behavior since truly generic representations should be easily adapted to any other visual recognition task, i.e., the learned representations should be robust to the settings of LP and TL hyperparameters. In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods. First, we find that input normalization for LP is crucial to eliminate performance variations according to the hyperparameters. Specifically, batch normalization before feeding inputs to a linear classifier considerably improves the stability of evaluation, and also resolves inconsistency of k-NN and LP metrics. Second, for TL, we demonstrate that a weight decay parameter in SSL significantly affects the transferability of learned representations, which cannot be identified by LP or k-NN evaluations on the upstream dataset. We believe that the findings of this study will be beneficial for the community by drawing attention to the shortcomings in the current SSL evaluation schemes and underscoring the need to reconsider them.

updated: Fri Apr 07 2023 03:03:19 GMT+0000 (UTC)

published: Fri Apr 07 2023 03:03:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト