A study on the distribution of social biases in self-supervised learning visual models

Kirill Sirotkin; Pablo Carballeira; Marcos Escudero-Viñolo

自己管理学習視覚モデルにおける社会的バイアスの分布に関する研究

ディープニューラルネットワークは、十分にサンプリングされている場合、データ分布の学習に効率的です。ただし、トレーニングデータに暗黙的に組み込まれている関連性のない要因によって、それらは強くバイアスされる可能性があります。これらには、効果のないデータサンプリングや不均一なデータサンプリングなどの運用上のバイアスが含まれますが、社会的バイアスがトレーニングデータに暗黙的に存在するか、トレーニングデータに明示的に定義されているか、不公平なトレーニングスケジュールで明示的に定義されているため、倫理的な懸念も含まれます。人間のプロセスに影響を与えるタスクでは、社会的偏見の学習は、差別的で非倫理的で信頼できない結果を生み出す可能性があります。社会的バイアスはラベル付きデータの教師あり学習に起因すると考えられることが多く、したがって、自己教師あり学習（SSL）は、ラベル付きデータを必要としないため、効率的でバイアスのないソリューションとして誤って表示されます。ただし、最近、一般的なSSL方式にもバイアスが組み込まれていることが証明されました。この論文では、心理学の専門家が社会的バイアスを測定するために設計した方法とデータセットを使用して、ImageNetデータを使用してトレーニングされたSSLビジュアルモデルのさまざまなセットのバイアスを研究します。 SSLモデルのタイプとそれが組み込むバイアスの数との間に相関関係があることを示します。さらに、この結果は、この数値がモデルの精度とネットワーク全体の変化に厳密に依存していないことも示唆しています。最後に、SSLモデルを慎重に選択するプロセスにより、高いパフォーマンスを維持しながら、展開されたモデルの社会的バイアスの数を減らすことができると結論付けています。

Deep neural networks are efficient at learning the data distribution if it is sufficiently sampled. However, they can be strongly biased by non-relevant factors implicitly incorporated in the training data. These include operational biases, such as ineffective or uneven data sampling, but also ethical concerns, as the social biases are implicitly present\textemdash even inadvertently, in the training data or explicitly defined in unfair training schedules. In tasks having impact on human processes, the learning of social biases may produce discriminatory, unethical and untrustworthy consequences. It is often assumed that social biases stem from supervised learning on labelled data, and thus, Self-Supervised Learning (SSL) wrongly appears as an efficient and bias-free solution, as it does not require labelled data. However, it was recently proven that a popular SSL method also incorporates biases. In this paper, we study the biases of a varied set of SSL visual models, trained using ImageNet data, using a method and dataset designed by psychological experts to measure social biases. We show that there is a correlation between the type of the SSL model and the number of biases that it incorporates. Furthermore, the results also suggest that this number does not strictly depend on the model's accuracy and changes throughout the network. Finally, we conclude that a careful SSL model selection process can reduce the number of social biases in the deployed model, whilst keeping high performance.

updated: Thu Mar 03 2022 17:03:21 GMT+0000 (UTC)

published: Thu Mar 03 2022 17:03:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト