Generative models produce astonishingly high-resolution and realistic facial images. However, reliably evaluating the quality of these images remains challenging, not to mention performing a systematic investigation of the potential biases in generative adversarial models (GAN). In this paper, we argue that crowdsourcing can be used to measure the biases in GAN quantitatively. We showcase an investigation that examines whether GAN-generated facial images with darker skin tones are of worse quality. We ask crowd workers to guess whether the image is real or fake, and use this as a proxy metric for estimating the quality of facial images generated by state-of-the-art GANs. The results show that GANs generate worse quality images with darker skin tones as compared to images with lighter skin tones.
Hangzhi Guo (The Pennsylvania State University)
Lizhen Zhu (Pennsylvania State University)
Ting-Hao Huang (Pennsylvania State University)
More from the Same Authors
2022 Workshop: Human Evaluation of Generative Models »
Divyansh Kaushik · Jennifer Hsia · Jessica Huynh · Yonadav Shavit · Samuel Bowman · Ting-Hao Huang · Douwe Kiela · Zachary Lipton · Eric Michael Smith