Discover the Unknown Biased Attribute of an Image Classifier

Zhiheng Li; Chenliang Xu

画像分類器の未知のバイアス属性を発見する

最近の研究では、AIアルゴリズムがデータからバイアスを学習することがわかりました。したがって、AIアルゴリズムのバイアスを特定することが緊急かつ不可欠です。ただし、以前のバイアス識別パイプラインは、潜在的なバイアス（たとえば、性別）を推測するために人間の専門家に過度に依存しており、人間によって実現されていない他の根本的なバイアスを無視する可能性があります。人間の専門家がAIアルゴリズムのバイアスをより適切に見つけるのを助けるために、この作業の新しい問題を研究します。入力画像のターゲット属性を予測する分類器について、未知のバイアス属性を発見します。この困難な問題を解決するために、生成モデルの潜在空間で超平面を使用して画像属性を表します。したがって、元の問題は、超平面の法線ベクトルとオフセットを最適化することに変換されます。このフレームワーク内の新しい全変動損失を目的関数として提案し、新しい直交化ペナルティを制約として提案します。後者は、検出されたバイアス属性がターゲットまたは既知のバイアス属性の1つと同一であるという些細な解決策を防ぎます。解きほぐしデータセットと実際のデータセットの両方での広範な実験は、私たちの方法が偏った属性を発見し、ターゲット属性でより良い解きほぐしを達成できることを示しています。さらに、定性的な結果は、私たちの方法がさまざまなオブジェクトおよびシーン分類子の目立たない偏った属性を発見できることを示し、画像のさまざまなドメインで偏った属性を検出するための私たちの方法の一般化可能性を証明します。コードはhttps://git.io/J3kMhで入手できます。

Recent works find that AI algorithms learn biases from data. Therefore, it is urgent and vital to identify biases in AI algorithms. However, the previous bias identification pipeline overly relies on human experts to conjecture potential biases (e.g., gender), which may neglect other underlying biases not realized by humans. To help human experts better find the AI algorithms' biases, we study a new problem in this work -- for a classifier that predicts a target attribute of the input image, discover its unknown biased attribute. To solve this challenging problem, we use a hyperplane in the generative model's latent space to represent an image attribute; thus, the original problem is transformed to optimizing the hyperplane's normal vector and offset. We propose a novel total-variation loss within this framework as the objective function and a new orthogonalization penalty as a constraint. The latter prevents trivial solutions in which the discovered biased attribute is identical with the target or one of the known-biased attributes. Extensive experiments on both disentanglement datasets and real-world datasets show that our method can discover biased attributes and achieve better disentanglement w.r.t. target attributes. Furthermore, the qualitative results show that our method can discover unnoticeable biased attributes for various object and scene classifiers, proving our method's generalizability for detecting biased attributes in diverse domains of images. The code is available at https://git.io/J3kMh.

updated: Tue Jun 08 2021 17:59:55 GMT+0000 (UTC)

published: Thu Apr 29 2021 17:59:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト