A Simple Baseline for Low-Budget Active Learning

Kossar Pourahmadi; Parsa Nooralinejad; Hamed Pirsiavash

低予算のアクティブラーニングのためのシンプルなベースライン

アクティブラーニングは、ラベル付けするラベルなしデータのサブセットの選択に重点を置いています。ただし、そのような方法のほとんどは、データの大部分に注釈を付けることができることを前提としています。小さなサブセット（たとえば、ImageNetの0.2％）のみに注釈を付けることができる低予算のアクティブラーニングに関心があります。初期プールを指定してラベルなしデータのバッチを繰り返しサンプリングする新しいクエリ戦略を提案する代わりに、既製の自己監視学習方法で豊富な機能を1回だけ学習し、低予算でさまざまなサンプリング戦略の有効性を調査します。 ImageNetデータセットだけでなく、さまざまなデータセットで。最先端の能動学習法は、データラベリングの予算が大きい場合はうまく機能しますが、単純なk-meansクラスタリングアルゴリズムは、低予算でそれらを上回ることができることを示します。この方法は、画像分類に関する低予算のアクティブラーニングの簡単なベースラインとして使用できると考えています。コードはhttps://github.com/UCDvision/low-budget-alで入手できます。

Active learning focuses on choosing a subset of unlabeled data to be labeled. However, most such methods assume that a large subset of the data can be annotated. We are interested in low-budget active learning where only a small subset (e.g., 0.2% of ImageNet) can be annotated. Instead of proposing a new query strategy to iteratively sample batches of unlabeled data given an initial pool, we learn rich features by an off-the-shelf self-supervised learning method only once and then study the effectiveness of different sampling strategies given a low budget on a variety of datasets as well as ImageNet dataset. We show that although the state-of-the-art active learning methods work well given a large budget of data labeling, a simple k-means clustering algorithm can outperform them on low budgets. We believe this method can be used as a simple baseline for low-budget active learning on image classification. Code is available at: https://github.com/UCDvision/low-budget-al

updated: Fri Oct 22 2021 19:36:56 GMT+0000 (UTC)

published: Fri Oct 22 2021 19:36:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト