Skip to yearly menu bar Skip to main content


Reproducibility Study of "Label-Free Explainability for Unsupervised Models"

Valentinos Pariza · Avik Pal · Madhura Pawar · Quim Serra Faber


Scope of ReproducibilityIn this work, we evaluate the reproducibility of the paper Label-Free Explainability for Unsupervised Models by Crabbe and van der Schaar. Our goal is to reproduce the paper's four main claims in a label-free setting:(1) feature importance scores determine salient features of a model's input, (2) example importance scores determine salient training examples to explain a test example, (3) interpretability of saliency maps is hard for disentangled VAEs, (4) distinct pretext tasks don’t have interchangeable representations.MethodologyThe authors of the paper provide an implementation in PyTorch for their proposed techniques and experiments. We reuse and extend their code for our additional experiments. Our reproducibility study comes at a total computational cost of 110 GPU hours, using an NVIDIA Titan RTX. ResultsWe reproduced the original paper's work through our experiments. We find that the main claims of the paper largely hold. We assess the robustness and generalizability of some of the claims, through our additional experiments. In that case, we find that one claim is not generalizable and another is not reproducible for the graph dataset.What was easyThe original paper is well-structured. The code implementation is well-organized and with clear instructions on how to get started. This was helpful to understand the paper's work and begin experimenting with their proposed methods.What was difficultWe found it difficult to extrapolate some of the authors' proposed techniques to datasets other than those used by them. Also, we were not able to reproduce the results for one of the experiments. We couldn't find the exact reason for it by running explorative experiments due to time and resource constraints.Communication with original authorsWe reached out to the authors once about our queries regarding one experimental setup and to understand the assumptions and contexts of some sub-claims in the paper. We received a prompt response which satisfied most of our questions.

Chat is not available.