Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation

Lars Schmarje; Vasco Grossmann; Claudius Zelenka; Sabine Dippel; Rainer Kiko; Mariusz Oszust; Matti Pastell; Jenny Stracke; Anna Valros; Nina Volkmann; Reinahrd Koch

注釈は 1 つだけで十分ですか?ノイズが多くあいまいなラベル推定のためのデータ中心の画像分類ベンチマーク

最新の機械学習には高品質のデータが必要です。しかし、そのようなデータの取得は、ノイズが多く、人間のあいまいな注釈のために困難です。画像のラベルを決定するためのこのような注釈の集約は、データ品質の低下につながります。研究者がこのようなデータ品質の問題の影響を調査および定量化できるように、10 個の実世界のデータセットと画像ごとに複数の注釈を使用したデータ中心の画像分類ベンチマークを提案します。ベンチマークを使用すると、さまざまなアルゴリズムと多様なデータセットに新しい方法論を適用することで、画像分類のデータ品質に対するアノテーションコストと (半) 教師あり手法の影響を調べることができます。私たちのベンチマークは、第 1 段階でデータラベル改善方法を使用し、第 2 段階で固定評価モデルを使用する 2 段階のアプローチを使用します。これにより、入力ラベリング作業と (半) 教師ありアルゴリズムのパフォーマンスとの関係を測定し、効果的なモデルトレーニングのためにラベルを作成する方法についてより深い洞察を可能にします。何千回もの実験を通じて、1 つの注釈では不十分であり、複数の注釈を含めることで、実際の基礎となるクラス分布をより適切に近似できることが示されました。ハードラベルではデータのあいまいさを捉えることができず、これが自信過剰なモデルの一般的な問題につながる可能性があることを認識しています。提示されたデータセット、ベンチマークされた方法、および分析に基づいて、ラベルノイズ推定アプローチ、データ注釈スキーム、現実的な (半) 教師あり学習、またはより信頼性の高い画像収集の改善に向けた、将来のための複数の研究機会を作成します。

High-quality data is necessary for modern machine learning. However, the acquisition of such data is difficult due to noisy and ambiguous annotations of humans. The aggregation of such annotations to determine the label of an image leads to a lower data quality. We propose a data-centric image classification benchmark with ten real-world datasets and multiple annotations per image to allow researchers to investigate and quantify the impact of such data quality issues. With the benchmark we can study the impact of annotation costs and (semi-)supervised methods on the data quality for image classification by applying a novel methodology to a range of different algorithms and diverse datasets. Our benchmark uses a two-phase approach via a data label improvement method in the first phase and a fixed evaluation model in the second phase. Thereby, we give a measure for the relation between the input labeling effort and the performance of (semi-)supervised algorithms to enable a deeper insight into how labels should be created for effective model training. Across thousands of experiments, we show that one annotation is not enough and that the inclusion of multiple annotations allows for a better approximation of the real underlying class distribution. We identify that hard labels can not capture the ambiguity of the data and this might lead to the common issue of overconfident models. Based on the presented datasets, benchmarked methods, and analysis, we create multiple research opportunities for the future directed at the improvement of label noise estimation approaches, data annotation schemes, realistic (semi-)supervised learning, or more reliable image collection.

updated: Thu Oct 13 2022 07:48:45 GMT+0000 (UTC)

published: Wed Jul 13 2022 14:17:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト