Timezone: »
Concept Bottleneck Models (CBMs) tackle the opacity of neural architectures by constructing and explaining their predictions using a set of high-level concepts. A special property of these models is that they permit concept interventions, wherein users can correct mispredicted concepts and thus improve the model's performance. Recent work, however, has shown that intervention efficacy can be highly dependent on the order in which concepts are intervened on and on the model's architecture and training hyperparameters. We argue that this is rooted in a CBM's lack of train-time incentives for the model to be appropriately receptive to concept interventions. To address this, we propose Intervention-aware Concept Embedding models (IntCEMs), a novel CBM-based architecture and training paradigm that improves a model's receptiveness to test-time interventions. Our model learns a concept intervention policy in an end-to-end fashion from where it can sample meaningful intervention trajectories at train-time. This conditions IntCEMs to effectively select and receive concept interventions when deployed at test-time. Our experiments show that IntCEMs significantly outperform state-of-the-art concept-interpretable models when provided with test-time concept interventions, demonstrating the effectiveness of our approach.
Author Information
Mateo Espinosa Zarlenga (University of Cambridge)
Katie Collins (University of Cambridge)
Krishnamurthy Dvijotham (DeepMind)
Krishnamurthy Dvijotham is a research scientist at Google Deepmind. Until recently, he was a research engineer at Pacific Northwest National Laboratory (PNNL) in the optimization and control group. He was previously a postdoctoral fellow at the Center for Mathematics of Information at Caltech. He received his PhD in computer science and engineering from the University of Washington, Seattle in 2014 and a bachelors from IIT Bombay in 2008. His research interests span stochastic control theory, artificial intelligence, machine learning and markets/economics, and his work is motivated primarily by problems arising in large-scale infrastructure systems like the power grid. His research has won awards at several conferences in optimization, AI and machine learning.
Adrian Weller (Cambridge, Alan Turing Institute)

Adrian Weller MBE is a Director of Research in Machine Learning at the University of Cambridge, and at the Leverhulme Centre for the Future of Intelligence where he is Programme Director for Trust and Society. He is a Turing AI Fellow in Trustworthy Machine Learning, and heads Safe and Ethical AI at The Alan Turing Institute, the UK national institute for data science and AI. His interests span AI, its commercial applications and helping to ensure beneficial outcomes for society. He serves on several boards and previously held senior roles in finance.
Zohreh Shams (Babylon Health, University of Cambridge)
Mateja Jamnik (University of Cambridge)
More from the Same Authors
-
2021 Spotlight: Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications »
Leonard Berrada · Sumanth Dathathri · Krishnamurthy Dvijotham · Robert Stanforth · Rudy Bunel · Jonathan Uesato · Sven Gowal · M. Pawan Kumar -
2021 : Efficient Decompositional Rule Extraction for Deep Neural Networks »
Mateo Espinosa Zarlenga · Mateja Jamnik -
2021 : Interpretable Data Analysis for Bench-to-Bedside Research »
Zohreh Shams · Botty Dimanov · Nikola Simidjievski · Helena Andres-Terre · Paul Scherer · Urška Matjašec · Mateja Jamnik · Pietro Lió -
2021 : A fine-grained analysis of robustness to distribution shifts »
Olivia Wiles · Sven Gowal · Florian Stimberg · Sylvestre-Alvise Rebuffi · Ira Ktena · Krishnamurthy Dvijotham · Taylan Cemgil -
2022 Poster: Scalable Infomin Learning »
Yanzhi Chen · weihao sun · Yingzhen Li · Adrian Weller -
2022 : Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs »
Albert Jiang · Sean Welleck · Jin Zhou · Timothee Lacroix · Jiacheng Liu · Wenda Li · Mateja Jamnik · Guillaume Lample · Yuhuai Wu -
2022 : Human Interventions in Concept Graph Networks »
Lucie Charlotte Magister · Pietro Barbiero · Dmitry Kazhdan · Federico Siciliano · Gabriele Ciravegna · Fabrizio Silvestri · Mateja Jamnik · Pietro Lió -
2022 : Conformal Prediction for Resource Prioritisation in Predicting Rare and Dangerous Outcomes »
Varun Babbar · Umang Bhatt · Miri Zilka · Adrian Weller -
2023 : AI for Mathematics: A Cognitive Science Perspective »
Cedegao (Ced) Zhang · Katie Collins · Adrian Weller · Josh Tenenbaum -
2023 : Do Concept Bottleneck Models Obey Locality? »
Naveen Raman · Mateo Espinosa Zarlenga · Juyeon Heo · Mateja Jamnik -
2023 : Estimation of Concept Explanations Should be Uncertainty Aware »
Vihari Piratla · Juyeon Heo · Sukriti Singh · Adrian Weller -
2023 : Use Perturbations when Learning from Explanations »
Juyeon Heo · Vihari Piratla · Matthew Wicker · Adrian Weller -
2023 : Correlated Noise Provably Beats Independent Noise for Differentially Private Learning »
Christopher A. Choquette-Choo · Krishnamurthy Dvijotham · Krishna Pillutla · Arun Ganesh · Thomas Steinke · Abhradeep Guha Thakurta -
2023 : Hybrid Early Fusion for Multi-Modal Biomedical Representations »
Konstantin Hemker · Nikola Simidjievski · Mateja Jamnik -
2023 : Reward Model Underspecification in Language Model Alignment »
Jacob Eisenstein · Jonathan Berant · Chirag Nagpal · Alekh Agarwal · Ahmad Beirami · Alexander D'Amour · Krishnamurthy Dvijotham · Katherine Heller · Stephen Pfohl · Deepak Ramachandran -
2023 : GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data »
Andrei Margeloiu · Nikola Simidjievski · Pietro Lió · Mateja Jamnik -
2023 : GCondNet: A Novel Method for Improving Neural Networks on Small High-Dimensional Tabular Data »
Andrei Margeloiu · Nikola Simidjievski · Pietro Lió · Mateja Jamnik -
2023 : HEALNet – Improving Medical Image Analysis using Multi-Omic Context via Hybrid Early Fusion »
Konstantin Hemker · Nikola Simidjievski · Mateja Jamnik -
2023 : Hybrid Early Fusion for Multi-Modal Biomedical Representations »
Konstantin Hemker · Nikola Simidjievski · Mateja Jamnik -
2023 Workshop: MATH-AI: The 3rd Workshop on Mathematical Reasoning and AI »
Zhenwen Liang · Albert Q. Jiang · Katie Collins · Pan Lu · Kaiyu Yang · Sean Welleck · James McClelland -
2023 Poster: Quasi-Monte Carlo Graph Random Features »
Isaac Reid · Adrian Weller · Krzysztof M Choromanski -
2023 Poster: Use perturbations when learning from explanations »
Juyeon Heo · Vihari Piratla · Matthew Wicker · Adrian Weller -
2023 Poster: Dense-Exponential Random Features: Sharp Positive Estimators of the Gaussian Kernel »
Valerii Likhosherstov · Krzysztof M Choromanski · Kumar Avinava Dubey · Frederick Liu · Tamas Sarlos · Adrian Weller -
2023 Poster: Diffused Redundancy in Pre-trained Representations »
Vedant Nanda · Till Speicher · John Dickerson · Krishna Gummadi · Soheil Feizi · Adrian Weller -
2023 Poster: Controlling Text-to-Image Diffusion by Orthogonal Finetuning »
Zeju Qiu · Weiyang Liu · Haiwen Feng · Yuxuan Xue · Yao Feng · Zhen Liu · Dan Zhang · Adrian Weller · Bernhard Schölkopf -
2023 Poster: Provably Bounding Neural Network Preimages »
Suhas Kotha · Christopher Brix · J. Zico Kolter · Krishnamurthy Dvijotham · Huan Zhang -
2023 Poster: Training Private Models That Know What They Don’t Know »
Stephan Rabanser · Anvith Thudi · Abhradeep Guha Thakurta · Krishnamurthy Dvijotham · Nicolas Papernot -
2023 Poster: Certification of Distributional Individual Fairness »
Matthew Wicker · Vihari Piratla · Adrian Weller -
2022 Poster: Autoformalization with Large Language Models »
Yuhuai Wu · Albert Q. Jiang · Wenda Li · Markus Rabe · Charles Staats · Mateja Jamnik · Christian Szegedy -
2022 Poster: Concept Embedding Models: Beyond the Accuracy-Explainability Trade-Off »
Mateo Espinosa Zarlenga · Pietro Barbiero · Gabriele Ciravegna · Giuseppe Marra · Francesco Giannini · Michelangelo Diligenti · Zohreh Shams · Frederic Precioso · Stefano Melacci · Adrian Weller · Pietro Lió · Mateja Jamnik -
2022 Poster: Thor: Wielding Hammers to Integrate Language Models and Automated Theorem Provers »
Albert Q. Jiang · Wenda Li · Szymon Tworkowski · Konrad Czechowski · Tomasz Odrzygóźdź · Piotr Miłoś · Yuhuai Wu · Mateja Jamnik -
2022 Poster: Chefs' Random Tables: Non-Trigonometric Random Features »
Valerii Likhosherstov · Krzysztof M Choromanski · Kumar Avinava Dubey · Frederick Liu · Tamas Sarlos · Adrian Weller -
2022 Poster: A Survey and Datasheet Repository of Publicly Available US Criminal Justice Datasets »
Miri Zilka · Bradley Butcher · Adrian Weller -
2021 : [S12] Efficient Decompositional Rule Extraction for Deep Neural Networks »
Mateo Espinosa Zarlenga · Mateja Jamnik -
2021 Workshop: Privacy in Machine Learning (PriML) 2021 »
Yu-Xiang Wang · Borja Balle · Giovanni Cherubin · Kamalika Chaudhuri · Antti Honkela · Jonathan Lebensold · Casey Meehan · Mi Jung Park · Adrian Weller · Yuqing Zhu -
2021 Workshop: Human Centered AI »
Michael Muller · Plamen P Angelov · Shion Guha · Marina Kogan · Gina Neff · Nuria Oliver · Manuel Rodriguez · Adrian Weller -
2021 Workshop: AI for Science: Mind the Gaps »
Payal Chandak · Yuanqi Du · Tianfan Fu · Wenhao Gao · Kexin Huang · Shengchao Liu · Ziming Liu · Gabriel Spadon · Max Tegmark · Hanchen Wang · Adrian Weller · Max Welling · Marinka Zitnik -
2021 Poster: Learning Signal-Agnostic Manifolds of Neural Fields »
Yilun Du · Katie Collins · Josh Tenenbaum · Vincent Sitzmann -
2021 Poster: Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications »
Leonard Berrada · Sumanth Dathathri · Krishnamurthy Dvijotham · Robert Stanforth · Rudy Bunel · Jonathan Uesato · Sven Gowal · M. Pawan Kumar -
2021 Poster: Overcoming the Convex Barrier for Simplex Inputs »
Harkirat Singh Behl · M. Pawan Kumar · Philip Torr · Krishnamurthy Dvijotham -
2020 Workshop: Privacy Preserving Machine Learning - PriML and PPML Joint Edition »
Borja Balle · James Bell · Aurélien Bellet · Kamalika Chaudhuri · Adria Gascon · Antti Honkela · Antti Koskela · Casey Meehan · Olga Ohrimenko · Mi Jung Park · Mariana Raykova · Mary Anne Smart · Yu-Xiang Wang · Adrian Weller -
2020 Poster: Ode to an ODE »
Krzysztof Choromanski · Jared Quincy Davis · Valerii Likhosherstov · Xingyou Song · Jean-Jacques Slotine · Jacob Varley · Honglak Lee · Adrian Weller · Vikas Sindhwani -
2019 Workshop: Privacy in Machine Learning (PriML) »
Borja Balle · Kamalika Chaudhuri · Antti Honkela · Antti Koskela · Casey Meehan · Mi Jung Park · Mary Anne Smart · Mary Anne Smart · Adrian Weller -
2019 : Poster Session »
Jonathan Scarlett · Piotr Indyk · Ali Vakilian · Adrian Weller · Partha P Mitra · Benjamin Aubin · Bruno Loureiro · Florent Krzakala · Lenka Zdeborová · Kristina Monakhova · Joshua Yurtsever · Laura Waller · Hendrik Sommerhoff · Michael Moeller · Rushil Anirudh · Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jayaraman Thiagarajan · Salman Asif · Michael Gillhofer · Johannes Brandstetter · Sepp Hochreiter · Felix Petersen · Dhruv Patel · Assad Oberai · Akshay Kamath · Sushrut Karmalkar · Eric Price · Ali Ahmed · Zahra Kadkhodaie · Sreyas Mohan · Eero Simoncelli · Carlos Fernandez-Granda · Oscar Leong · Wesam Sakla · Rebecca Willett · Stephan Hoyer · Jascha Sohl-Dickstein · Sam Greydanus · Gauri Jagatap · Chinmay Hegde · Michael Kellman · Jonathan Tamir · Nouamane Laanait · Ousmane Dia · Mirco Ravanelli · Jonathan Binas · Negar Rostamzadeh · Shirin Jalali · Tiantian Fang · Alex Schwing · Sébastien Lachapelle · Philippe Brouillard · Tristan Deleu · Simon Lacoste-Julien · Stella Yu · Arya Mazumdar · Ankit Singh Rawat · Yue Zhao · Jianshu Chen · Xiaoyang Li · Hubert Ramsauer · Gabrio Rizzuti · Nikolaos Mitsakos · Dingzhou Cao · Thomas Strohmer · Yang Li · Pei Peng · Gregory Ongie -
2019 Workshop: Workshop on Human-Centric Machine Learning »
Plamen P Angelov · Nuria Oliver · Adrian Weller · Manuel Rodriguez · Isabel Valera · Silvia Chiappa · Hoda Heidari · Niki Kilbertus -
2019 Poster: Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models »
Yunfei Teng · Wenbo Gao · François Chalus · Anna Choromanska · Donald Goldfarb · Adrian Weller -
2018 Workshop: Privacy Preserving Machine Learning »
Adria Gascon · Aurélien Bellet · Niki Kilbertus · Olga Ohrimenko · Mariana Raykova · Adrian Weller -
2018 Poster: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2018 Spotlight: Geometrically Coupled Monte Carlo Sampling »
Mark Rowland · Krzysztof Choromanski · François Chalus · Aldo Pacchiano · Tamas Sarlos · Richard Turner · Adrian Weller -
2017 : Invited talk: Challenges for Transparency »
Adrian Weller -
2017 : Closing remarks »
Adrian Weller -
2017 Symposium: Kinds of intelligence: types, tests and meeting the needs of society »
José Hernández-Orallo · Zoubin Ghahramani · Tomaso Poggio · Adrian Weller · Matthew Crosby -
2017 Poster: From Parity to Preference-based Notions of Fairness in Classification »
Muhammad Bilal Zafar · Isabel Valera · Manuel Rodriguez · Krishna Gummadi · Adrian Weller -
2017 Poster: The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings »
Krzysztof Choromanski · Mark Rowland · Adrian Weller -
2017 Poster: Uprooting and Rerooting Higher-Order Graphical Models »
Mark Rowland · Adrian Weller -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2016 Symposium: Machine Learning and the Law »
Adrian Weller · Thomas D. Grant · Conrad McDonnell · Jatinder Singh -
2015 Symposium: Algorithms Among Us: the Societal Impacts of Machine Learning »
Michael A Osborne · Adrian Weller · Murray Shanahan -
2014 Poster: Clamping Variables and Approximate Inference »
Adrian Weller · Tony Jebara -
2014 Oral: Clamping Variables and Approximate Inference »
Adrian Weller · Tony Jebara