Blockwise Self-Supervised Learning at Scale

Shoaib Ahmed Siddiqui; David Krueger; Yann LeCun; Stéphane Deny

大規模な Blockwise 自己教師あり学習

現在の最先端のディープネットワークはすべて、バックプロパゲーションによって強化されています。このホワイトペーパーでは、自己教師あり学習の最新の開発を活用して、ブロック単位の学習ルールの形で完全な逆伝播に代わる方法を探ります。各ブロックで Barlow Twins の損失関数を使用して ResNet-50 のレイヤーの 4 つのメインブロックを個別にトレーニングすることからなるブロック単位の事前トレーニング手順が、ImageNet でエンドツーエンドのバックプロパゲーションとほぼ同じように機能することを示します。ブロック単位の事前トレーニング済みモデルのトップは、70.48% のトップ 1 分類精度を取得します。これは、エンドツーエンドの事前トレーニング済みネットワークの精度 (71.57% の精度) よりわずか 1.1% 低いだけです。私たちは、メソッド内のさまざまなコンポーネントの影響を理解するために広範な実験を行い、ブロック単位のパラダイムへの自己教師あり学習のさまざまな適応を調査し、ローカル学習ルールを大規模ネットワークにスケーリングするための重要な手段の徹底的な理解を構築します。ハードウェア設計から神経科学まで。

Current state-of-the-art deep networks are all powered by backpropagation. In this paper, we explore alternatives to full backpropagation in the form of blockwise learning rules, leveraging the latest developments in self-supervised learning. We show that a blockwise pretraining procedure consisting of training independently the 4 main blocks of layers of a ResNet-50 with Barlow Twins' loss function at each block performs almost as well as end-to-end backpropagation on ImageNet: a linear probe trained on top of our blockwise pretrained model obtains a top-1 classification accuracy of 70.48%, only 1.1% below the accuracy of an end-to-end pretrained network (71.57% accuracy). We perform extensive experiments to understand the impact of different components within our method and explore a variety of adaptations of self-supervised learning to the blockwise paradigm, building an exhaustive understanding of the critical avenues for scaling local learning rules to large networks, with implications ranging from hardware design to neuroscience.

updated: Sun Aug 11 2024 15:59:30 GMT+0000 (UTC)

published: Fri Feb 03 2023 10:48:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト