LeRaC: Learning Rate Curriculum

Florinel-Alin Croitoru; Nicolae-Catalin Ristea; Radu Tudor Ionescu; Nicu Sebe

LeRaC：学習率カリキュラム

ほとんどのカリキュラム学習方法では、データサンプルを難易度で並べ替えるアプローチが必要ですが、これは実行が面倒なことがよくあります。この作業では、学習率カリキュラム（LeRaC）と呼ばれる新しいカリキュラム学習アプローチを提案します。これは、ニューラルネットワークの各層に異なる学習率を使用して、初期トレーニングエポック中にデータのないカリキュラムを作成します。より具体的には、LeRaCは、入力に近いニューラルレイヤーに高い学習率を割り当て、レイヤーが入力から離れるにつれて学習率を徐々に下げます。学習率は、最初のトレーニングの反復中に、すべて同じ値に達するまで、さまざまなペースで増加します。この時点から、ニューラルモデルは通常どおりトレーニングされます。これにより、例を難易度で並べ替える必要がなく、ニューラルネットワークと互換性のあるモデルレベルのカリキュラム学習戦略が作成され、アーキテクチャに関係なく、より高いパフォーマンスレベルが生成されます。コンピュータービジョン（CIFAR-10、CIFAR-100、Tiny ImageNet）、言語（BoolQ、QNLI、RTE）、オーディオ（ESC-50、CREMA-D）ドメインの8つのデータセットについて、さまざまな畳み込み（ResNet）を考慮して包括的な実験を行います。 -18、Wide-ResNet-50、DenseNet-121）、リカレント（LSTM）およびトランスフォーマー（CvT、BERT、SepTr）アーキテクチャ、従来のトレーニング体制とのアプローチの比較。さらに、最先端のデータフリーカリキュラム学習アプローチであるCurriculum by Smoothing（CBS）とも比較します。 CBSとは異なり、標準のトレーニングレジームに対するパフォーマンスの向上は、すべてのデータセットとモデルで一貫しています。さらに、トレーニング時間の点でCBSを大幅に上回っています（LeRaCの標準的なトレーニング体制に追加のコストはかかりません）。

Most curriculum learning methods require an approach to sort the data samples by difficulty, which is often cumbersome to perform. In this work, we propose a novel curriculum learning approach termed Learning Rate Curriculum (LeRaC), which leverages the use of a different learning rate for each layer of a neural network to create a data-free curriculum during the initial training epochs. More specifically, LeRaC assigns higher learning rates to neural layers closer to the input, gradually decreasing the learning rates as the layers are placed farther away from the input. The learning rates increase at various paces during the first training iterations, until they all reach the same value. From this point on, the neural model is trained as usual. This creates a model-level curriculum learning strategy that does not require sorting the examples by difficulty and is compatible with any neural network, generating higher performance levels regardless of the architecture. We conduct comprehensive experiments on eight datasets from the computer vision (CIFAR-10, CIFAR-100, Tiny ImageNet), language (BoolQ, QNLI, RTE) and audio (ESC-50, CREMA-D) domains, considering various convolutional (ResNet-18, Wide-ResNet-50, DenseNet-121), recurrent (LSTM) and transformer (CvT, BERT, SepTr) architectures, comparing our approach with the conventional training regime. Moreover, we also compare with Curriculum by Smoothing (CBS), a state-of-the-art data-free curriculum learning approach. Unlike CBS, our performance improvements over the standard training regime are consistent across all datasets and models. Furthermore, we significantly surpass CBS in terms of training time (there is no additional cost over the standard training regime for LeRaC).

updated: Sat Nov 19 2022 10:41:32 GMT+0000 (UTC)

published: Wed May 18 2022 18:57:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト