Exemplar-free Continual Learning of Vision Transformers via Gated Class-Attention and Cascaded Feature Drift Compensation

Marco Cotogni; Fei Yang; Claudio Cusano; Andrew D. Bagdanov; Joost van de Weijer

Gated Class-Attention および Cascaded Feature Drift Compensation によるビジョントランスフォーマーの見本なしの継続的学習

ViT の模範を含まないクラス増分トレーニングの新しい方法を提案します。手本のない継続的な学習の主な課題は、以前に学習したタスクを壊滅的に忘れることなく、学習者の可塑性を維持することです。これは、多くの場合、新しいタスクを学習するときに発生する機能ドリフトに対して以前のタスク分類器を再調整するのに役立つ模範的なリプレイによって達成されます。ただし、模範的な再生には、前のタスクからのサンプルを保持するという代償が伴います。これは、多くのアプリケーションでは不可能な場合があります。継続的な ViT トレーニングの問題に対処するために、最初に、最終的な ViT 変換ブロックのドリフトを最小限に抑えるために、ゲートクラス注意を提案します。このマスクベースのゲーティングは、最後の Transformer ブロックのクラスアテンションメカニズムに適用され、前のタスクに重要な重みを強力に調整します。重要なことに、gated class-attention は推論中にタスク ID を必要としないため、他のパラメーター分離方法と区別されます。第二に、新しいタスクを学習するときにバックボーンの機能ドリフトに対応する機能ドリフト補償の新しい方法を提案します。ゲートクラス注意とカスケード機能ドリフト補償の組み合わせにより、以前のタスクの忘却を制限しながら、新しいタスクへの可塑性が可能になります。 CIFAR-100、Tiny-ImageNet、および ImageNet100 で実施された広範な実験は、リハーサルベースの ViT 方法と比較した場合、模範を使用しない方法が競争力のある結果を得ることを示しています。

We propose a new method for exemplar-free class incremental training of ViTs. The main challenge of exemplar-free continual learning is maintaining plasticity of the learner without causing catastrophic forgetting of previously learned tasks. This is often achieved via exemplar replay which can help recalibrate previous task classifiers to the feature drift which occurs when learning new tasks. Exemplar replay, however, comes at the cost of retaining samples from previous tasks which for many applications may not be possible. To address the problem of continual ViT training, we first propose gated class-attention to minimize the drift in the final ViT transformer block. This mask-based gating is applied to class-attention mechanism of the last transformer block and strongly regulates the weights crucial for previous tasks. Importantly, gated class-attention does not require the task-ID during inference, which distinguishes it from other parameter isolation methods. Secondly, we propose a new method of feature drift compensation that accommodates feature drift in the backbone when learning new tasks. The combination of gated class-attention and cascaded feature drift compensation allows for plasticity towards new tasks while limiting forgetting of previous ones. Extensive experiments performed on CIFAR-100, Tiny-ImageNet and ImageNet100 demonstrate that our exemplar-free method obtains competitive results when compared to rehearsal based ViT methods.

updated: Tue Mar 14 2023 22:52:23 GMT+0000 (UTC)

published: Tue Nov 22 2022 14:13:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト