DivAug: Plug-in Automated Data Augmentation with Explicit Diversity Maximization

Zirui Liu; Haifeng Jin; Ting-Hsiang Wang; Kaixiong Zhou; Xia Hu

DivAug：明示的な多様性の最大化によるプラグインの自動データ拡張

過去2年間で、人間が設計したデータ拡張戦略は、自動的に学習された拡張ポリシーに置き換えられました。具体的には、最近の研究では、自動化されたデータ拡張方法の優れたパフォーマンスは、拡張されたデータの自動拡張、ランダウグの多様性の増加に起因することが経験的に示されています。ただし、拡張データの多様性に関する2つの要因がまだ欠落しています。1）多様性の明示的な定義（したがって測定）、および2）多様性とその正則化効果の間の定量化可能な関係。このギャップを埋めるために、Variance Diversityと呼ばれる多様性の尺度を提案し、データ拡張の正則化効果がVarianceDiversityによって約束されることを理論的に示します。実験では、テスト精度の自動データ拡張による相対的なゲインが分散の多様性と高度に相関していることを検証します。教師なしサンプリングベースのフレームワークであるDivAugは、分散の多様性を直接最大化し、正則化効果を強化するように設計されています。個別の検索プロセスを必要とせずに、DivAugによるパフォーマンスの向上は、最先端の方法に匹敵し、効率が向上します。さらに、半教師あり設定では、フレームワークはRandAugmentと比較して半教師あり学習アルゴリズムのパフォーマンスをさらに向上させることができるため、ラベル付けされたデータが不足している現実の問題に非常に適用できます。コードはhttps://github.com/warai-0toko/DivAugで入手できます。

Human-designed data augmentation strategies have been replaced by automatically learned augmentation policy in the past two years. Specifically, recent work has empirically shown that the superior performance of the automated data augmentation methods stems from increasing the diversity of augmented data autoaug, randaug. However, two factors regarding the diversity of augmented data are still missing: 1) the explicit definition (and thus measurement) of diversity and 2) the quantifiable relationship between diversity and its regularization effects. To bridge this gap, we propose a diversity measure called Variance Diversity and theoretically show that the regularization effect of data augmentation is promised by Variance Diversity. We validate in experiments that the relative gain from automated data augmentation in test accuracy is highly correlated to Variance Diversity. An unsupervised sampling-based framework, DivAug, is designed to directly maximize Variance Diversity and hence strengthen the regularization effect. Without requiring a separate search process, the performance gain from DivAug is comparable with the state-of-the-art method with better efficiency. Moreover, under the semi-supervised setting, our framework can further improve the performance of semi-supervised learning algorithms compared to RandAugment, making it highly applicable to real-world problems, where labeled data is scarce. The code is available at https://github.com/warai-0toko/DivAug.

updated: Wed Aug 11 2021 19:32:42 GMT+0000 (UTC)

published: Fri Mar 26 2021 16:00:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト