Exploring the Equivalence of Siamese Self-Supervised Learning via A Unified Gradient Framework

Chenxin Tao; Honghui Wang; Xizhou Zhu; Jiahua Dong; Shiji Song; Gao Huang; Jifeng Dai

統一された勾配フレームワークを介したシャムの自己教師あり学習の同等性の調査

自己教師あり学習は、人間の注釈なしで強力な視覚的表現を抽出する大きな可能性を示しています。さまざまな視点からの自己教師あり学習を扱うために、さまざまな作業が提案されています。（1）対照的な学習方法（MoCo、SimCLRなど）は、トレーニングの方向性を導くためにポジティブサンプルとネガティブサンプルの両方を利用します。（2）非対称ネットワーク手法（BYOL、SimSiamなど）は、予測ネットワークの導入と停止勾配操作を介して負のサンプルを取り除きます。（3）機能の非相関化方法（Barlow Twins、VICRegなど）は、代わりに機能の次元間の冗長性を減らすことを目的としています。これらの方法は、さまざまな動機から設計された損失関数でかなり異なるように見えます。最終的な精度の数値も異なり、さまざまなネットワークやトリックがさまざまな作業で利用されます。この作業では、これらのメソッドを同じ形式に統合できることを示します。それらの損失関数を比較する代わりに、勾配分析を通じて統一された式を導き出します。さらに、公正かつ詳細な実験を行い、性能を比較しています。これらの方法の間にはほとんどギャップがなく、運動量エンコーダーの使用がパフォーマンスを向上させるための重要な要素であることがわかりました。この統一されたフレームワークから、自己教師あり学習のためのシンプルで効果的な勾配形式であるUniGradを提案します。メモリバンクや予測ネットワークは必要ありませんが、最先端のパフォーマンスを実現し、他のトレーニング戦略を簡単に採用できます。線形評価と多くの下流タスクに関する広範な実験も、その有効性を示しています。コードはhttps://github.com/fundamentalvision/UniGradでリリースされています。

Self-supervised learning has shown its great potential to extract powerful visual representations without human annotations. Various works are proposed to deal with self-supervised learning from different perspectives: (1) contrastive learning methods (e.g., MoCo, SimCLR) utilize both positive and negative samples to guide the training direction; (2) asymmetric network methods (e.g., BYOL, SimSiam) get rid of negative samples via the introduction of a predictor network and the stop-gradient operation; (3) feature decorrelation methods (e.g., Barlow Twins, VICReg) instead aim to reduce the redundancy between feature dimensions. These methods appear to be quite different in the designed loss functions from various motivations. The final accuracy numbers also vary, where different networks and tricks are utilized in different works. In this work, we demonstrate that these methods can be unified into the same form. Instead of comparing their loss functions, we derive a unified formula through gradient analysis. Furthermore, we conduct fair and detailed experiments to compare their performances. It turns out that there is little gap between these methods, and the use of momentum encoder is the key factor to boost performance. From this unified framework, we propose UniGrad, a simple but effective gradient form for self-supervised learning. It does not require a memory bank or a predictor network, but can still achieve state-of-the-art performance and easily adopt other training strategies. Extensive experiments on linear evaluation and many downstream tasks also show its effectiveness. Code is released at https://github.com/fundamentalvision/UniGrad.

updated: Tue Jul 05 2022 09:34:26 GMT+0000 (UTC)

published: Thu Dec 09 2021 18:59:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト