An Embedding-Dynamic Approach to Self-supervised Learning

Suhong Moon; Domas Buracas; Seunghyun Park; Jinkyu Kim; John Canny

自己教師あり学習への埋め込み動的アプローチ

最近の多くの自己教師あり学習方法は、画像分類やその他のタスクで印象的なパフォーマンスを示しています。やや当惑するようなさまざまな手法が使用されてきましたが、特に組み合わせて使用した場合、それらの利点の理由を常に明確に理解しているとは限りません。ここでは、画像の埋め込みを点粒子として扱い、モデルの最適化をこの粒子システムの動的プロセスと見なします。私たちの動的モデルは、類似した画像の引力、局所的な崩壊を回避するための局所的な分散力、および粒子の全体的に均一な分布を実現するための全体的な分散力を組み合わせています。動的な視点は、同じ画像の複数のビューと一緒に遅延パラメータ画像埋め込み（a la BYOL）を使用することの利点を強調しています。また、他の方法よりもパフォーマンスが向上し、他の粒子座標の知識を必要としない、純粋に動的な局所分散力（ブラウン運動）を使用します。この方法はMSBRegと呼ばれ、（i）マルチビュー重心損失を表します。これは、異なる画像ビューの埋め込みを重心に向かって引っ張るために引力を適用します。（ii）特異値損失は、粒子システムを空間的に均一な密度に向かって押します。 iii）ブラウンの拡散損失。 ImageNetでのMSBRegのダウンストリーム分類パフォーマンスと、きめ細かい分類、マルチクラスオブジェクト分類、オブジェクト検出、インスタンスセグメンテーションなどの転移学習タスクを評価します。さらに、正則化項を他のメソッドに適用すると、モードの崩壊を防ぐことで、パフォーマンスがさらに向上し、トレーニングが安定することも示します。

A number of recent self-supervised learning methods have shown impressive performance on image classification and other tasks. A somewhat bewildering variety of techniques have been used, not always with a clear understanding of the reasons for their benefits, especially when used in combination. Here we treat the embeddings of images as point particles and consider model optimization as a dynamic process on this system of particles. Our dynamic model combines an attractive force for similar images, a locally dispersive force to avoid local collapse, and a global dispersive force to achieve a globally-homogeneous distribution of particles. The dynamic perspective highlights the advantage of using a delayed-parameter image embedding (a la BYOL) together with multiple views of the same image. It also uses a purely-dynamic local dispersive force (Brownian motion) that shows improved performance over other methods and does not require knowledge of other particle coordinates. The method is called MSBReg which stands for (i) a Multiview centroid loss, which applies an attractive force to pull different image view embeddings toward their centroid, (ii) a Singular value loss, which pushes the particle system toward spatially homogeneous density, (iii) a Brownian diffusive loss. We evaluate downstream classification performance of MSBReg on ImageNet as well as transfer learning tasks including fine-grained classification, multi-class object classification, object detection, and instance segmentation. In addition, we also show that applying our regularization term to other methods further improves their performance and stabilize the training by preventing a mode collapse.

updated: Thu Jul 07 2022 19:56:20 GMT+0000 (UTC)

published: Thu Jul 07 2022 19:56:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト