TokenMixup: Efficient Attention-guided Token-level Data Augmentation for Transformers

Hyeong Kyu Choi; Joonmyung Choi; Hyunwoo J. Kim

TokenMixup: トランスフォーマー向けの効率的な注意ガイド付きトークンレベルのデータ拡張

ミックスアップは、画像分類に一般的に採用されているデータ拡張手法です。ミックスアップ手法の最近の進歩は、主に顕著性に基づくミキシングに焦点を当てています。ただし、多くの顕著性検出器は集中的な計算を必要とし、特にパラメーターの多い変換モデルでは負担が大きくなります。この目的のために、トークンの混合セットの顕著性を最大化することを目的とした効率的な注意ガイド付きトークンレベルのデータ拡張方法である TokenMixup を提案します。 TokenMixup は、勾配ベースの方法と比較して、15 倍高速な顕著性を意識したデータ拡張を提供します。さらに、単一のインスタンス内でトークンを混合する TokenMixup のバリアントを導入することで、マルチスケールの機能拡張を可能にします。実験では、以前の方法よりも効率的であると同時に、CIFAR および ImageNet-1K でのベースラインモデルのパフォーマンスが大幅に向上することが示されています。また、CIFAR-100 ではゼロから作成された変圧器モデルの中で最先端のパフォーマンスに到達します。コードは https://github.com/mlvlab/TokenMixup で入手できます。

Mixup is a commonly adopted data augmentation technique for image classification. Recent advances in mixup methods primarily focus on mixing based on saliency. However, many saliency detectors require intense computation and are especially burdensome for parameter-heavy transformer models. To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens. TokenMixup provides x15 faster saliency-aware data augmentation compared to gradient-based methods. Moreover, we introduce a variant of TokenMixup which mixes tokens within a single instance, thereby enabling multi-scale feature augmentation. Experiments show that our methods significantly improve the baseline models' performance on CIFAR and ImageNet-1K, while being more efficient than previous methods. We also reach state-of-the-art performance on CIFAR-100 among from-scratch transformer models. Code is available at https://github.com/mlvlab/TokenMixup.

updated: Fri Oct 14 2022 06:36:31 GMT+0000 (UTC)

published: Fri Oct 14 2022 06:36:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト