DivAug: Plug-in Automated Data Augmentation with Explicit Diversity Maximization

Zirui Liu; Haifeng Jin; Ting-Hsiang Wang; Kaixiong Zhou; Xia Hu

DivAug：明示的な多様性最大化を備えたプラグイン自動データ拡張

過去2年間で、人間が設計したデータ拡張戦略は、自動的に学習された拡張ポリシーに置き換えられました。具体的には、最近の研究では、自動化されたデータ拡張方法の優れたパフォーマンスは、拡張データの多様性の増加に起因することが経験的に示されています。ただし、拡張データの多様性に関する2つの要因がまだ欠落しています。1）多様性の明示的な定義（したがって測定）、および2）多様性とその正則化効果の間の定量化可能な関係。このギャップを埋めるために、分散ダイバーシティと呼ばれるダイバーシティ指標を提案し、データ拡張の正則化効果が分散ダイバーシティによって約束されることを理論的に示します。実験では、テスト精度の自動データ拡張による相対的なゲインが分散の多様性と高度に相関していることを検証します。教師なしサンプリングベースのフレームワークであるDivAugは、分散の多様性を直接最大化し、正則化効果を強化するように設計されています。個別の検索プロセスを必要とせずに、DivAugによるパフォーマンスの向上は、より効率の高い最先端の方法に匹敵します。さらに、半教師あり設定では、フレームワークはRandAugmentと比較して半教師あり学習アルゴリズムのパフォーマンスをさらに向上させることができるため、ラベル付けされたデータが不足している現実の問題に非常に適用できます。

Human-designed data augmentation strategies have been replaced by automatically learned augmentation policy in the past two years. Specifically, recent work has empirically shown that the superior performance of the automated data augmentation methods stems from increasing the diversity of augmented data. However, two factors regarding the diversity of augmented data are still missing: 1) the explicit definition (and thus measurement) of diversity and 2) the quantifiable relationship between diversity and its regularization effects. To bridge this gap, we propose a diversity measure called Variance Diversity and theoretically show that the regularization effect of data augmentation is promised by Variance Diversity. We validate in experiments that the relative gain from automated data augmentation in test accuracy is highly correlated to Variance Diversity. An unsupervised sampling-based framework, DivAug, is designed to directly maximize Variance Diversity and hence strengthen the regularization effect. Without requiring a separate search process, the performance gain from DivAug is comparable with the state-of-the-art method with better efficiency. Moreover, under the semi-supervised setting, our framework can further improve the performance of semi-supervised learning algorithms when compared to RandAugment, making it highly applicable to real-world problems, where labeled data is scarce.

updated: Fri Mar 26 2021 16:00:01 GMT+0000 (UTC)

published: Fri Mar 26 2021 16:00:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト