Uncovering bias in the PlantVillage dataset

Mehmet Alican Noyan

PlantVillageデータセットのバイアスを明らかにする

深層学習ベースの植物病害検出モデルをトレーニングするための人気のあるPlantVillageデータセットの使用に関する調査を報告します。 PlantVillage画像の背景からわずか8ピクセルを使用して、機械学習モデルをトレーニングしました。モデルは、差し出されたテストセットで49.0％の精度を達成しました。これは、ランダムな推測の精度である2.6％をはるかに上回っています。この結果は、PlantVillageデータセットにラベルと相関するノイズが含まれており、深層学習モデルがこのバイアスを簡単に利用して予測を行うことができることを示しています。この問題を軽減するための可能なアプローチについて説明します。

We report our investigation on the use of the popular PlantVillage dataset for training deep learning based plant disease detection models. We trained a machine learning model using only 8 pixels from the PlantVillage image backgrounds. The model achieved 49.0% accuracy on the held-out test set, well above the random guessing accuracy of 2.6%. This result indicates that the PlantVillage dataset contains noise correlated with the labels and deep learning models can easily exploit this bias to make predictions. Possible approaches to alleviate this problem are discussed.

updated: Thu Jun 09 2022 09:32:35 GMT+0000 (UTC)

published: Thu Jun 09 2022 09:32:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト