MCAL: Minimum Cost Human-Machine Active Labeling

Hang Qiu; Krishna Chintalapudi; Ramesh Govindan

MCAL: 最小コストのヒューマンマシンアクティブラベル付け

現在、グラウンドトゥルースの生成は、クラウドベースの注釈サービスによって注釈が付けられたデータセットに依存しています。これらは人間による注釈に依存しており、法外にコストがかかる可能性があります。このホワイトペーパーでは、データセットの一部に正確に自動ラベル付けするように分類器をトレーニングするハイブリッドヒューマンマシンラベリングの問題を検討します。ただし、分類器のトレーニングもコストがかかる可能性があります。各ステップで、人間を使用してどのサンプルにラベルを付けるか、訓練された分類子を使用してどのサンプルにラベルを付けるかを共同で決定することにより、総コストを最小限に抑える反復アプローチを提案します。 Fashion-MNIST、CIFAR-10、CIFAR-100、ImageNet などのよく知られた公開データセットでアプローチを検証します。場合によっては、私たちのアプローチは、データセット全体を人間がラベル付けする場合と比較して全体のコストが 6 分の 1 であり、最も安価な競合戦略よりも常に安価です。

Today, groundtruth generation relies on datasets annotated by cloud-based annotation services. These rely on human annotation, which can be prohibitively expensive. In this paper, we consider the problem of hybrid human-machine labeling, which trains a classifier to accurately auto-label part of the data set. However, training the classifier can be expensive too. We propose an iterative approach that minimizes total overall cost by, at each step, jointly determining which samples to label using humans and which to label using the trained classifier. We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and ImageNet. In some cases, our approach has 6x lower overall cost relative to human labeling the entire dataset, and is always cheaper than the cheapest competing strategy.

updated: Tue Feb 14 2023 20:25:32 GMT+0000 (UTC)

published: Wed Jun 24 2020 19:01:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト