Self-Supervised Representation Learning as Multimodal Variational Inference

Hiroki Nakamura; Masashi Okada; Tadahiro Taniguchi

マルチモーダル変分推論としての自己監視表現学習

この論文は、最近の自己監視学習（SSL）法であるSimSiamの確率的拡張を提案します。 SimSiamは、同じ画像のさまざまな拡張ビューの画像表現間の類似性を最大化することにより、モデルをトレーニングします。不確実性を意識した機械学習は、深い変分推論のように一般的になりつつありますが、SimSiamやその他のSSLは不確実性を十分に認識していないため、その可能性が制限される可能性があります。提案された拡張は、変分推論に基づいてSimSiamを不確実性に対応させることです。私たちの主な貢献は2つあります。最初に、非対照SSLとマルチモーダル変分推論の間の理論的関係を明らかにします。次に、変分推論SimSiam（VI-SimSiam）と呼ばれる新しいSSLを紹介します。これは、球形の事後分布を含むことによって不確実性を組み込んでいます。私たちの実験は、VI-SimSiamが、表現の不確実性をうまく推定することにより、ImageNetteおよびImageWoofの分類タスクでSimSiamよりも優れていることを示しています。

This paper proposes a probabilistic extension of SimSiam, a recent self-supervised learning (SSL) method. SimSiam trains a model by maximizing the similarity between image representations of different augmented views of the same image. Although uncertainty-aware machine learning has been getting general like deep variational inference, SimSiam and other SSL are insufficiently uncertainty-aware, which could lead to limitations on its potential. The proposed extension is to make SimSiam uncertainty-aware based on variational inference. Our main contributions are twofold: Firstly, we clarify the theoretical relationship between non-contrastive SSL and multimodal variational inference. Secondly, we introduce a novel SSL called variational inference SimSiam (VI-SimSiam), which incorporates the uncertainty by involving spherical posterior distributions. Our experiment shows that VI-SimSiam outperforms SimSiam in classification tasks in ImageNette and ImageWoof by successfully estimating the representation uncertainty.

updated: Tue Mar 22 2022 03:17:15 GMT+0000 (UTC)

published: Tue Mar 22 2022 03:17:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト