arXiv reaDer
Leveraging vision-language models for fair facial attribute classification
Performance disparities of image recognition across different demographic populations are known to exist in deep learning-based models, but previous work has largely addressed such fairness problems assuming knowledge of sensitive attribute labels. To overcome this reliance, previous strategies have involved separate learning structures to expose and adjust for disparities. In this work, we explore a new paradigm that does not require sensitive attribute labels, and evades the need for extra training by leveraging general-purpose vision-language model (VLM), as a rich knowledge source for common sensitive attributes. We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution. We find that VLMs can recognize samples with clear attribute information encoded in image representations, thus capture under-performed samples conflicting with attribute-related bias. We train downstream target classifiers by re-sampling and augmenting under-performed attribute groups. Extensive experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines that tackle with arbitrary bias. The work indicates that vision-language models can extract discriminative sensitive information prompted by language, and be used to promote model fairness.
updated: Tue Sep 17 2024 02:02:16 GMT+0000 (UTC)
published: Fri Mar 15 2024 18:37:15 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)
Amazon.co.jpアソシエイト