A critical analysis of self-supervision, or what we can learn from a single image

Yuki M. Asano; Christian Rupprecht; Andrea Vedaldi

自己監視の重要な分析、または単一の画像から学べること

手動ラベルなしで深い畳み込みニューラルネットワークを学習するための一般的な自己監視手法を批判的に検討します。 3つの異なる代表的な方法であるBiGAN、RotNet、およびDeepClusterは、強力なデータ拡張が使用される場合、数百万の画像と手動ラベルを使用するだけでなく、単一の画像から畳み込みネットワークの最初の数層を学習できることを示します。ただし、より深い層では、何百万ものラベルのない画像がトレーニングに使用されていても、手動監視のギャップを埋めることはできません。（1）深層ネットワークの初期層の重みには自然画像の統計に関する限られた情報が含まれている、（2）このような低レベルの統計は、自己監視だけでなく強力な監視によっても学習できる、（3）低レベルの統計は、大きな画像データセットを使用する代わりに、合成変換によってキャプチャできます。

We look critically at popular self-supervision techniques for learning deep convolutional neural networks without manual labels. We show that three different and representative methods, BiGAN, RotNet and DeepCluster, can learn the first few layers of a convolutional network from a single image as well as using millions of images and manual labels, provided that strong data augmentation is used. However, for deeper layers the gap with manual supervision cannot be closed even if millions of unlabelled images are used for training. We conclude that: (1) the weights of the early layers of deep networks contain limited information about the statistics of natural images, that (2) such low-level statistics can be learned through self-supervision just as well as through strong supervision, and that (3) the low-level statistics can be captured via synthetic transformations instead of using a large image dataset.

updated: Wed Feb 19 2020 17:56:41 GMT+0000 (UTC)

published: Tue Apr 30 2019 10:10:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト