Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration

Yu Wang; Jingyang Lin; Jingjing Zou; Yingwei Pan; Ting Yao; Tao Mei

自動化された教師なし外れ値アービトレーションによる自己監視学習の改善

私たちの仕事は、既存の主流の自己監視学習方法の構造化された欠点を明らかにします。自己監視型学習フレームワークは通常、一般的な完全なインスタンスレベルの不変性仮説を当然のことと見なしますが、背後にある落とし穴を注意深く調査します。特に、複数のポジティブビューを生成するための既存の拡張パイプラインは、ダウンストリームタスクの学習を損なう分布外（OOD）サンプルを自然に導入すると主張します。入力に多様な前向きな増強を生成することは、下流のタスクに利益をもたらすことで常に報われるとは限りません。この固有の欠陥を克服するために、自己監視学習のビューサンプリング問題を対象とした軽量潜在変数モデルUOTAを導入します。 UOTAは、ビューを生成するために最も重要なサンプリング領域を適応的に検索し、外れ値にロバストな自己監視学習アプローチの実行可能な選択肢を提供します。私たちの方法は、損失の性質が対照的であるかどうかに関係なく、多くの主流の自己監視学習アプローチに直接一般化されます。明確なマージンを備えた最先端の自己監視パラダイムに対するUOTAの利点を経験的に示しています。これは、既存のアプローチに埋め込まれたOODサンプルの問題の存在を正当化するものです。特に、提案のメリットは、推定量の分散とバイアスの削減が保証されることを理論的に証明します。コードはhttps://github.com/ssl-codelab/uotaで入手できます。

Our work reveals a structured shortcoming of the existing mainstream self-supervised learning methods. Whereas self-supervised learning frameworks usually take the prevailing perfect instance level invariance hypothesis for granted, we carefully investigate the pitfalls behind. Particularly, we argue that the existing augmentation pipeline for generating multiple positive views naturally introduces out-of-distribution (OOD) samples that undermine the learning of the downstream tasks. Generating diverse positive augmentations on the input does not always pay off in benefiting downstream tasks. To overcome this inherent deficiency, we introduce a lightweight latent variable model UOTA, targeting the view sampling issue for self-supervised learning. UOTA adaptively searches for the most important sampling region to produce views, and provides viable choice for outlier-robust self-supervised learning approaches. Our method directly generalizes to many mainstream self-supervised learning approaches, regardless of the loss's nature contrastive or not. We empirically show UOTA's advantage over the state-of-the-art self-supervised paradigms with evident margin, which well justifies the existence of the OOD sample issue embedded in the existing approaches. Especially, we theoretically prove that the merits of the proposal boil down to guaranteed estimator variance and bias reduction. Code is available: at https://github.com/ssl-codelab/uota.

updated: Wed Dec 15 2021 14:05:23 GMT+0000 (UTC)

published: Wed Dec 15 2021 14:05:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト