Deep Active Learning with Contrastive Learning Under Realistic Data Pool Assumptions

Jihyo Kim; Jeonghyeon Kim; Sangheum Hwang

現実的なデータプールの仮定の下での対照学習によるディープアクティブラーニング

アクティブラーニングの目的は、ラベル付けされていないデータプールから最も有益なデータを特定することで、モデルが目的の精度に迅速に到達できるようにすることです。これは、高性能を達成するために一般に膨大な数のラベル付きサンプルを必要とするディープニューラルネットワークに特に役立ちます。ほとんどの既存のアクティブラーニング手法は、ターゲットタスクに関連するサンプル、つまり分布中のサンプルのみがラベルのないデータプールに存在する理想的な設定で評価されてきました。ただし、野生から収集されたデータプールには、ターゲットタスクにまったく関係のないサンプルや、オラクルに対してさえ単一のクラスラベルを割り当てるにはあいまいすぎるサンプルが含まれる可能性があります。さまざまな分布からのサンプルで構成されるラベルのないデータプールを想定する方がより現実的であると主張します。この作業では、あいまいでタスクに関係のない配布外サンプルと配布内サンプルを含む新しいアクティブラーニングベンチマークを紹介します。また、有益な分布サンプルを優先的に取得するように設計されたアクティブラーニング手法も提案します。提案された方法は、ラベル付けされたデータプールとラベル付けされていないデータプールの両方を活用し、対照学習によって構築された特徴空間上のクラスターからサンプルを選択します。実験結果は、提案された方法が同じレベルの精度に到達するために既存の能動学習方法よりも低い注釈予算を必要とすることを示しています。

Active learning aims to identify the most informative data from an unlabeled data pool that enables a model to reach the desired accuracy rapidly. This benefits especially deep neural networks which generally require a huge number of labeled samples to achieve high performance. Most existing active learning methods have been evaluated in an ideal setting where only samples relevant to the target task, i.e., in-distribution samples, exist in an unlabeled data pool. A data pool gathered from the wild, however, is likely to include samples that are irrelevant to the target task at all and/or too ambiguous to assign a single class label even for the oracle. We argue that assuming an unlabeled data pool consisting of samples from various distributions is more realistic. In this work, we introduce new active learning benchmarks that include ambiguous, task-irrelevant out-of-distribution as well as in-distribution samples. We also propose an active learning method designed to acquire informative in-distribution samples in priority. The proposed method leverages both labeled and unlabeled data pools and selects samples from clusters on the feature space constructed via contrastive learning. Experimental results demonstrate that the proposed method requires a lower annotation budget than existing active learning methods to reach the same level of accuracy.

updated: Sat Mar 25 2023 10:46:10 GMT+0000 (UTC)

published: Sat Mar 25 2023 10:46:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト