Revisiting the Importance of Amplifying Bias for Debiasing

Jungsoo Lee; Jeonghoon Park; Daeyoung Kim; Juyoung Lee; Edward Choi; Jaegul Choo

バイアス緩和のためにバイアスを増幅することの重要性を再考する

画像分類における「バイアス緩和」の目的は、データセットバイアス (データサンプルの周辺属性とターゲットクラスとの間の強い相関関係) の影響を受けにくくなるように分類子をトレーニングすることです。たとえば、データセット内のカエルのクラスが主に沼地の背景を持つカエルの画像 (つまり、バイアスが整列したサンプル) で構成されている場合でも、バイアスを取り除いた分類器は、ビーチでカエルを正しく分類できるはずです (つまり、バイアスが競合するサンプル)。）。最近のバイアス緩和アプローチでは、通常、バイアス緩和のためにバイアスモデル f_B とバイアス緩和モデル f_D の 2 つのコンポーネントが使用されます。 f_B は、バイアスに合わせたサンプル (つまり、バイアスに過適合) に焦点を当てるようにトレーニングされますが、f_D は主に、f_B が学習に失敗したサンプルに集中することによってバイアス競合サンプルでトレーニングされ、f_D がデータセットバイアスの影響を受けにくくなります。最先端のバイアス緩和手法は f_D のトレーニングを改善することを目的としていますが、これまで見落とされていたコンポーネントである f_B のトレーニングに焦点を当てています。私たちの経験的分析は、f_B のトレーニングセットからバイアス競合サンプルを削除することが、f_D のバイアス緩和パフォーマンスを改善するために重要であることを明らかにしています。これは、これらのサンプルにはバイアス属性が含まれていないため、バイアス競合サンプルが f_B のバイアスを増幅するためのノイズの多いサンプルとして機能するためです。この目的のために、バイアス競合するサンプルを削除して、f_B をトレーニングするためのバイアス増幅されたデータセットを構築する、シンプルでありながら効果的なデータサンプル選択方法を提案します。私たちのデータサンプル選択方法は、既存の再重み付けベースのバイアス緩和アプローチに直接適用でき、一貫したパフォーマンスの向上を実現し、合成データセットと現実世界のデータセットの両方で最先端のパフォーマンスを実現します。

In image classification, "debiasing" aims to train a classifier to be less susceptible to dataset bias, the strong correlation between peripheral attributes of data samples and a target class. For example, even if the frog class in the dataset mainly consists of frog images with a swamp background (i.e., bias-aligned samples), a debiased classifier should be able to correctly classify a frog at a beach (i.e., bias-conflicting samples). Recent debiasing approaches commonly use two components for debiasing, a biased model f_B and a debiased model f_D. f_B is trained to focus on bias-aligned samples (i.e., overfitted to the bias) while f_D is mainly trained with bias-conflicting samples by concentrating on samples which f_B fails to learn, leading f_D to be less susceptible to the dataset bias. While the state-of-the-art debiasing techniques have aimed to better train f_D, we focus on training f_B, an overlooked component until now. Our empirical analysis reveals that removing the bias-conflicting samples from the training set for f_B is important for improving the debiasing performance of f_D. This is due to the fact that the bias-conflicting samples work as noisy samples for amplifying the bias for f_B since those samples do not include the bias attribute. To this end, we propose a simple yet effective data sample selection method which removes the bias-conflicting samples to construct a bias-amplified dataset for training f_B. Our data sample selection method can be directly applied to existing reweighting-based debiasing approaches, obtaining consistent performance boost and achieving the state-of-the-art performance on both synthetic and real-world datasets.

updated: Sun Dec 04 2022 02:23:38 GMT+0000 (UTC)

published: Sun May 29 2022 07:55:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト