Learn by Challenging Yourself: Contrastive Visual Representation Learning with Hard Sample Generation

Yawen Wu; Zhepeng Wang; Dewen Zeng; Yiyu Shi; Jingtong Hu

自分自身に挑戦して学ぶ：ハードサンプル生成による対照的な視覚表現学習

自己監視学習アプローチである対照学習（CL）は、ラベルのないデータから視覚的表現を効果的に学習できます。ただし、CLは、優れたパフォーマンスを実現するために膨大な量の多様なデータを学習する必要があります。それがないと、CLのパフォーマンスは大幅に低下します。この問題に取り組むために、有益なサンプルと共同学習を生成することにより、CLトレーニングのデータ効率を改善するための2つのアプローチを備えたフレームワークを提案します。最初のアプローチでは、メインモデルのハードサンプルが生成されます。ジェネレータはメインモデルと共同で学習され、メインモデルのトレーニング状態に基づいてハードサンプルを動的にカスタマイズします。メインモデルの知識が徐々に増えるにつれて、生成されたサンプルは、メインモデルがより良い表現を学習するように絶えず奨励することも難しくなります。その上、データジェネレータのペアは、正のペアとして類似しているが異なるサンプルを生成するために提案されています。共同学習では、正のペアの硬さは、それらの類似性を減らすことによって徐々に増加します。このように、メインモデルは、類似しているが別個のサンプルの表現をまとめることによってハードポジティブをクラスター化することを学習します。これにより、類似したサンプルの表現が適切にクラスター化され、より適切な表現を学習できます。包括的な実験は、複数のデータセットの最新技術よりも優れた精度と提案された方法のデータ効率を示しています。たとえば、線形分類では、ImageNet-100とCIFAR-10で約5％の精度向上、CIFAR-100で6％以上の精度向上が達成されています。さらに、線形分類では最大2倍のデータ効率、転移学習では最大5倍のデータ効率が達成されます。

Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled data. However, CL requires learning on vast quantities of diverse data to achieve good performance, without which the performance of CL will greatly degrade. To tackle this problem, we propose a framework with two approaches to improve the data efficiency of CL training by generating beneficial samples and joint learning. The first approach generates hard samples for the main model. The generator is jointly learned with the main model to dynamically customize hard samples based on the training state of the main model. With the progressively growing knowledge of the main model, the generated samples also become harder to constantly encourage the main model to learn better representations. Besides, a pair of data generators are proposed to generate similar but distinct samples as positive pairs. In joint learning, the hardness of a positive pair is progressively increased by decreasing their similarity. In this way, the main model learns to cluster hard positives by pulling the representations of similar yet distinct samples together, by which the representations of similar samples are well-clustered and better representations can be learned. Comprehensive experiments show superior accuracy and data efficiency of the proposed methods over the state-of-the-art on multiple datasets. For example, about 5% accuracy improvement on ImageNet-100 and CIFAR-10, and more than 6% accuracy improvement on CIFAR-100 are achieved for linear classification. Besides, up to 2x data efficiency for linear classification and up to 5x data efficiency for transfer learning are achieved.

updated: Mon Feb 14 2022 02:41:43 GMT+0000 (UTC)

published: Mon Feb 14 2022 02:41:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト