Timezone: »

Use perturbations when learning from explanations
Juyeon Heo · Vihari Piratla · Matthew Wicker · Adrian Weller

Thu Dec 14 08:45 AM -- 10:45 AM (PST) @ Great Hall & Hall B1+B2 #1618

Machine learning from explanations (MLX) is an approach to learning that uses human-provided explanations of relevant or irrelevant features for each input to ensure that model predictions are right for the right reasons. Existing MLX approaches rely on local model interpretation methods and require strong model smoothing to align model and human explanations, leading to sub-optimal performance. We recast MLX as a robustness problem, where human explanations specify a lower dimensional manifold from which perturbations can be drawn, and show both theoretically and empirically how this approach alleviates the need for strong model smoothing. We consider various approaches to achieving robustness, leading to improved performance over prior MLX methods. Finally, we show how to combine robustness with an earlier MLX method, yielding state-of-the-art results on both synthetic and real-world benchmarks.

Author Information

Juyeon Heo (University of Cambridge)
Vihari Piratla (University of Cambridge)
Matthew Wicker (Department of Computing, Imperial College London)
Adrian Weller (Cambridge, Alan Turing Institute)
Adrian Weller

Adrian Weller MBE is a Director of Research in Machine Learning at the University of Cambridge, and at the Leverhulme Centre for the Future of Intelligence where he is Programme Director for Trust and Society. He is a Turing AI Fellow in Trustworthy Machine Learning, and heads Safe and Ethical AI at The Alan Turing Institute, the UK national institute for data science and AI. His interests span AI, its commercial applications and helping to ensure beneficial outcomes for society. He serves on several boards and previously held senior roles in finance.

More from the Same Authors