[Re] Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Noah van der Vleuten · Tadija Radusinović · Rick Akkerman · Meilina Reksoprodjo

Hall J #1005

Keywords: [ ReScience - MLRC 2021 ] [ Journal Track ]

[ Abstract ]
[ Slides [ Poster
[Paper PDF]
Tue 29 Nov 2 p.m. PST — 4 p.m. PST


StylEx is a novel approach for classifier-conditioned training of StyleGan2, intending to capture classifier-specific attributes in its disentangled StyleSpace. Using the StylEx method, the behavior of a classifier can be explained and visualized by producing counterfactual images. The original authors, Lang et al., claim that its explanations are human-interpretable, distinct, coherent and sufficient to flip classifier predictions. Our replication efforts are five-fold: 1) As the training procedure and code were missing, we reimplemented the StylEx method in PyTorch to enable from the ground up reproducibility efforts of the original results. 2) We trained custom models on three datasets with a reduced image dimensionality to verify the original author’s claims. 3) We evaluate the Fréchet Inception Distance (FID) scores of generated images and show that the FID scores increase with the number of attributes used to generate a counterfactual explanation. 4) We conduct a user study (n=54) to evaluate the distinctiveness and coherence of the images, additionally we evaluate the ‘sufficiency’ scores of our models. 5) We release additional details on the training procedure of StylEx. Our experimental results support the claims posed in the original paper - the attributes detected by StylEx are identifiable by humans to a certain degree, distinct and sufficient. However, due to the significantly lower resolution and poorer image quality of the models, these results are not directly comparable to the ones posed in the original paper.

Chat is not available.