EquiMod: An Equivariance Module to Improve Self-Supervised Learning

Alexandre Devillers; Mathieu Lefort

EquiMod: 自己教師あり学習を改善するための等分散モジュール

自己教師あり視覚表現法は、教師あり学習パフォーマンスとのギャップを埋めています。これらの方法は、データ拡張によって作成された関連する合成入力の埋め込み間の類似性を最大化することに依存しています。これは、埋め込みがこれらの拡張によって変更された要因を除外すること、つまりそれらに対して不変であることを奨励するタスクと見なすことができます。ただし、これは拡張の選択におけるトレードオフの 1 つの側面のみを考慮しています。単純なソリューションショートカット学習 (たとえば、カラーヒストグラムのみを使用) を回避するために画像を大幅に変更する必要がありますが、一方で、拡張関連の情報一部のダウンストリームタスクの表現が不足している可能性があります (たとえば、鳥や花の分類では色が重要です)。オーグメンテーションに対する何らかの形式の等価性を調査することにより、不変性タスクのみを使用する問題を軽減することを提案した最近の研究はほとんどありません。これは、追加の埋め込みスペースを学習することによって実行されました。一部の拡張により埋め込みが異なりますが、制御されていません。この作業では、モジュールが拡張によって引き起こされる埋め込み空間の変位を予測することを学習するという意味で、学習された潜在空間を構造化する一般的な等分散モジュールである EquiMod を紹介します。そのモジュールを SimCLR や BYOL などの最先端の不変性モデルに適用すると、CIFAR10 および ImageNet データセットのパフォーマンスが向上することが示されています。さらに、私たちのモデルは自明な等分散、つまり不変性に崩壊する可能性がありますが、代わりに、表現に有益な拡張関連の情報を保持することを自動的に学習することがわかります。

Self-supervised visual representation methods are closing the gap with supervised learning performance. These methods rely on maximizing the similarity between embeddings of related synthetic inputs created through data augmentations. This can be seen as a task that encourages embeddings to leave out factors modified by these augmentations, i.e. to be invariant to them. However, this only considers one side of the trade-off in the choice of the augmentations: they need to strongly modify the images to avoid simple solution shortcut learning (e.g. using only color histograms), but on the other hand, augmentations-related information may be lacking in the representations for some downstream tasks (e.g. color is important for birds and flower classification). Few recent works proposed to mitigate the problem of using only an invariance task by exploring some form of equivariance to augmentations. This has been performed by learning additional embeddings space(s), where some augmentation(s) cause embeddings to differ, yet in a non-controlled way. In this work, we introduce EquiMod a generic equivariance module that structures the learned latent space, in the sense that our module learns to predict the displacement in the embedding space caused by the augmentations. We show that applying that module to state-of-the-art invariance models, such as SimCLR and BYOL, increases the performances on CIFAR10 and ImageNet datasets. Moreover, while our model could collapse to a trivial equivariance, i.e. invariance, we observe that it instead automatically learns to keep some augmentations-related information beneficial to the representations.

updated: Wed Nov 02 2022 16:25:54 GMT+0000 (UTC)

published: Wed Nov 02 2022 16:25:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト