FMix: Enhancing Mixed Sample Data Augmentation

Ethan Harris; Antonia Marcu; Matthew Painter; Mahesan Niranjan; Adam Prügel-Bennett; Jonathon Hare

FMix：混合サンプルデータ拡張の強化

混合サンプルデータ拡張（MSDA）は、MixUpやCutMixなどの多くの成功したバリアントとともに、近年ますます注目を集めています。元のデータと拡張データでVAEによって学習された関数間の相互情報量を調べることにより、MixUpがCutMixではない方法で学習された関数を歪めることを示します。さらに、MixUpが敵対的なトレーニングの一形態として機能し、MixUpによって生成されたものと同様の例を生成するDeepFoolやUniformNoiseなどの攻撃に対する堅牢性を高めることを示すことでこれを示します。この歪みにより、モデルがデータ内のサンプル固有の特徴について学習できなくなり、一般化のパフォーマンスが向上すると主張します。対照的に、CutMixは従来の拡張機能のように機能し、データ分散を歪めることなく記憶を防ぐことでパフォーマンスを向上させることをお勧めします。ただし、CutMix上に構築されて正方形だけでなく任意の形状のマスクを含むMSDAは、同じ方法でデータ分布を維持しながら、記憶をさらに防ぐことができると主張します。この目的のために、フーリエ空間からサンプリングされた低周波画像にしきい値を適用することによって得られたランダムバイナリマスクを使用するMSDAであるFMixを提案します。これらのランダムマスクは、さまざまな形状をとることができ、1次元、2次元、および3次元のデータで使用するために生成できます。 FMixは、トレーニング時間を増やすことなく、さまざまなデータセットと問題設定にわたる多数のモデルのパフォーマンスをMixUpおよびCutMixよりも向上させ、外部データなしでCIFAR-10で新しい単一モデルの最先端の結果を取得します。。最後に、MixUpなどのMSDAの補間とFMixなどのMSDAのマスキングの違いの結果、2つを組み合わせてパフォーマンスをさらに向上できることを示します。すべての実験のコードはhttps://github.com/ecs-vlc/FMixで提供されています。

Mixed Sample Data Augmentation (MSDA) has received increasing attention in recent years, with many successful variants such as MixUp and CutMix. By studying the mutual information between the function learned by a VAE on the original data and on the augmented data we show that MixUp distorts learned functions in a way that CutMix does not. We further demonstrate this by showing that MixUp acts as a form of adversarial training, increasing robustness to attacks such as Deep Fool and Uniform Noise which produce examples similar to those generated by MixUp. We argue that this distortion prevents models from learning about sample specific features in the data, aiding generalisation performance. In contrast, we suggest that CutMix works more like a traditional augmentation, improving performance by preventing memorisation without distorting the data distribution. However, we argue that an MSDA which builds on CutMix to include masks of arbitrary shape, rather than just square, could further prevent memorisation whilst preserving the data distribution in the same way. To this end, we propose FMix, an MSDA that uses random binary masks obtained by applying a threshold to low frequency images sampled from Fourier space. These random masks can take on a wide range of shapes and can be generated for use with one, two, and three dimensional data. FMix improves performance over MixUp and CutMix, without an increase in training time, for a number of models across a range of data sets and problem settings, obtaining a new single model state-of-the-art result on CIFAR-10 without external data. Finally, we show that a consequence of the difference between interpolating MSDA such as MixUp and masking MSDA such as FMix is that the two can be combined to improve performance even further. Code for all experiments is provided at https://github.com/ecs-vlc/FMix .

updated: Sun Feb 28 2021 14:47:36 GMT+0000 (UTC)

published: Thu Feb 27 2020 11:46:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト