Machine Learning has been extremely successful throughout many critical areas, including computer vision, natural language processing, and gameplaying. Still, a growing segment of the machine learning community recognizes that there are still fundamental pieces missing from the AI puzzle, among them causal inference.
This recognition comes from the observation that even though causality is a central component found throughout the sciences, engineering, and many other aspects of human cognition, explicit reference to causal relationships is largely missing in current learning systems. This entails a new goal of integrating causal inference and machine learning capabilities into the next generation of intelligent systems, thus paving the way towards higher levels of intelligence and humancentric AI. The synergy goes in both directions; causal inference benefitting from machine learning and the other way around. Current machine learning systems lack the ability to leverage the invariances imprinted by the underlying causal mechanisms towards reasoning about generalizability, explainability, interpretability, and robustness. Current causal inference methods, on the other hand, lack the ability to scale up to highdimensional settings, where current machine learning systems excel.
The goal of this workshop is to bring together researchers from both camps to initiate principled discussions about the integration of causal reasoning and machine learning perspectives to help tackle the challenging AI tasks of the coming decades. We welcome researchers from all relevant disciplines, including but not limited to computer science, cognitive science, robotics, mathematics, statistics, physics, and philosophy.
Mon 7:00 a.m.  7:10 a.m.

Intro
SlidesLive Video » 
🔗 
Mon 7:10 a.m.  7:30 a.m.

Uri Shalit  Calibration, outofdistribution generalization and a path towards causal representations
(
Invited Talk
)
SlidesLive Video » 
Uri Shalit 🔗 
Mon 7:30 a.m.  7:50 a.m.

Julius von Kügelgen  Independent mechanism analysis, a new concept?
(
Invited Talk
)
SlidesLive Video » Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are included in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation. Reference: https://arxiv.org/abs/2106.05200 (accepted at: NeurIPS 2021) 
Julius von Kügelgen 🔗 
Mon 7:50 a.m.  8:10 a.m.

David Blei  On the Assumptions of Synthetic Control Methods
(
Invited Talk
)
SlidesLive Video » 
David Blei 🔗 
Mon 8:10 a.m.  8:25 a.m.

Session 1: Q&A
(
Q&A
)
SlidesLive Video » 
🔗 
Mon 8:30 a.m.  8:50 a.m.

Ricardo Silva  The Road to Causal Programming
(
Invited Talk
)
SlidesLive Video » 
Ricardo Silva 🔗 
Mon 8:50 a.m.  9:10 a.m.

Aapo Hyvarinen  Causal discovery by generative modelling
(
Invited Talk
)
SlidesLive Video » There is a deep connection between causal discovery and generative models, such as factor analysis, independent component analysis, and various unsupervised deep learning models. Two key concepts that emerge are identifiability and nonstationarity. In this talk, I will review this research, providing some historical perspectives as well as open questions for future research. 
Aapo Hyvarinen 🔗 
Mon 9:10 a.m.  9:35 a.m.

Tobias Gerstenberg  Going beyond the here and now: Counterfactual simulation in human cognition
(
Invited Talk
)
SlidesLive Video » As humans, we spend much of our time going beyond the here and now. We dwell on the past, long for the future, and ponder how things could have turned out differently. In this talk, I will argue that people's knowledge of the world is organized around causally structured mental models, and that much of human thought can be understood as cognitive operations over these mental models. Specifically, I will highlight the pervasiveness of counterfactual thinking in human cognition. Counterfactuals are critical for how people make causal judgments, how they explain what happened, and how they hold others responsible for their actions. 
Tobias Gerstenberg 🔗 
Mon 9:35 a.m.  9:45 a.m.

Session 2: Q&A
(
Q&A
)
SlidesLive Video » 
🔗 
Mon 9:45 a.m.  10:45 a.m.

Poster Session link »  🔗 
Mon 10:45 a.m.  11:05 a.m.

Thomas Icard  A (topo)logical perspective on causal inference
(
Invited Talk
)
SlidesLive Video » 
Thomas Icard 🔗 
Mon 11:05 a.m.  11:25 a.m.

Caroline Uhler: TBA
(
Invited Talk
)
SlidesLive Video » 
Caroline Uhler 🔗 
Mon 11:25 a.m.  11:45 a.m.

Rosemary Ke  From "What" to "Why": towards causal learning
(
Invited Talk
)
SlidesLive Video » 
Nan Rosemary Ke 🔗 
Mon 11:45 a.m.  12:00 p.m.

Session 3: Q&A
(
Q&A
)
SlidesLive Video » 
🔗 
Mon 12:00 p.m.  12:45 p.m.

Judea Pearl  The logic of Causal Inference
(
Keynote Speaker
)
SlidesLive Video » 
🔗 
Mon 12:45 p.m.  1:00 p.m.

Discussion Panel

🔗 
Mon 1:00 p.m.  1:15 p.m.

Zaffalon, Antonucci, Cabañas  Causal ExpectationMaximisation
(
Contributed Talk
)
SlidesLive Video » Structural causal models are the basic modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which are at the top rung of the ladder of causation. But they often contain latent variables that limit their application to special settings. This appears to be a consequence of the fact, proven in this paper, that causal inference is NPhard even in models characterised by polytreeshaped graphs. To deal with such a hardness, we introduce the causal EM algorithm. Its primary aim is to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. Counterfactual inference is then addressed via standard algorithms for Bayesian networks. The result is a general method to approximately compute counterfactuals, be they identifiable or not (in which case we deliver bounds). We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. These results lead us finally to argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations. 
Marco Zaffalon · Alessandro Antonucci · Rafael Cabañas 🔗 
Mon 1:15 p.m.  1:30 p.m.

Dominguez Olmedo, Karimi, Schölkopf  On the Adversarial Robustness of Causal Algorithmic Recourse
(
Contributed Talk
)
SlidesLive Video » Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable outcomes made by automated decisionmaking systems. The individual then exerts time and effort to positively change their circumstances. Recourse recommendations should ideally be robust to reasonably small changes in the circumstances of the individual seeking recourse. In this work, we formulate the adversarially robust recourse problem and show that methods that offer minimally costly recourse fail to be robust. We restrict ourselves to linear classifiers, and show that the adversarially robust recourse problem reduces to the standard recourse problem for some modified classifier with a shifted decision boundary. Finally, we derive bounds on the extra cost incurred by individuals seeking robust recourse, and discuss how to regulate this cost between the individual and the decisionmaker. 
Ricardo DominguezOlmedo · Amir Karimi · Bernhard Schölkopf 🔗 
Mon 1:30 p.m.  1:45 p.m.

Javidian, Pandey, Jamshidi  Scalable Causal Domain Adaptation
(
Contributed Talk
)
SlidesLive Video » One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in lowdimensional and highdimensional settings. 
Mohammad Ali Javidian · Om Pandey · Pooyan Jamshidi 🔗 
Mon 1:45 p.m.  2:00 p.m.

Cundy, Grover, Ermon  BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
(
Contributed Talk
)
SlidesLive Video » A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximumlikelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is nonidentifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linearGaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables lowvariance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximumlikelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes. 
Chris Cundy · Aditya Grover · Stefano Ermon 🔗 
Mon 2:00 p.m.  2:20 p.m.

Alison Gopnik  Casual Learning in Children and Computational Models
(
Invited Talk
)
SlidesLive Video » Very young children routinely solve causal problems that are still very challenging for machine learning systems. I will outline several exciting recent lines of work looking at young children’s causal reasoning and learning and comparing it to learning in various computational models. This includes work on the selection of relevant test variables, learning abstract and analogical relationships, and, most importantly, techniques for active learning and causal exploration. 
Alison Gopnik 🔗 
Mon 2:20 p.m.  2:40 p.m.

Adèle Ribeiro  Effect Identification in Cluster Causal Diagrams
(
Invited Talk
)
SlidesLive Video » A pervasive task found throughout the empirical sciences is to determine the effect of interventions from observational data. It is wellunderstood that assumptions are necessary to perform such causal inferences, an idea popularized through Cartwright’s motto: "no causesin, no causesout." One way of articulating these assumptions is through the use of causal diagrams, which are a special type of graphical model with causal semantics [Pearl, 2000]. The graphical approach has been applied successfully in many settings, but there are still challenges to its use, particularly in complex, highdimensional domains. In this talk, I will introduce cluster causal diagrams (CDAGs), a novel causal graphical model that allows for the partial specification of the relationships among variables. CDAGs provide a simple yet effective way to partially abstract a grouping (cluster) of variables among which causal relationships are not fully understood while preserving consistency with the underlying causal system and the validity of causal identification tools. Reference: https://causalai.net/r77.pdf 
Adèle Ribeiro 🔗 
Mon 2:40 p.m.  3:00 p.m.

Victor Chernozhukov  Omitted Confounder Bias Bounds for Machine Learned Causal Models
(
Invited Talk
)
SlidesLive Video » 
Victor Chernozhukov 🔗 
Mon 3:00 p.m.  3:15 p.m.

Session 4: Q&A
(
Q&A
)
SlidesLive Video » 
🔗 
Mon 3:15 p.m.  3:30 p.m.

Closing Remarks

🔗 


Unsupervised Causal Binary Concepts Discovery with VAE for Blackbox Model Explanation
(
Poster
)
We aim to explain a blackbox classifier with the form: `data X is classified as class Y because X \textit{has} A, B and \textit{does not have} C' in which A, B, and C are highlevel concepts. The challenge is that we have to discover in an unsupervised manner a set of concepts, i.e., A, B and C, that is useful for the explaining the classifier. We first introduce a structural generative model that is suitable to express and discover such concepts. We then propose a learning process that simultaneously learns the data distribution and encourages certain concepts to have a large causal influence on the classifier output. Our method also allows easy integration of user's prior knowledge to induce high interpretability of concepts. Using multiple datasets, we demonstrate that our method can discover useful binary concepts for explanation. 
Thien Tran · Kazuto Fukuchi · Youhei Akimoto · Jun Sakuma 🔗 


Encoding Causal Macrovariables
(
Poster
)
In many scientific disciplines, coarsegrained causal models are used to explain and predict the dynamics of more finegrained systems. Naturally, such models require appropriate macrovariables. Automated procedures to detect suitable variables would be useful to leverage increasingly available highdimensional observational datasets. This work introduces a novel algorithmic approach that is inspired by a new characterisation of causal macrovariables as information bottlenecks between microstates. Its general form can be adapted to address individual needs of different scientific goals. After a further transformation step, the causal relationships between learned variables can be investigated through additive noise models. Experiments on both simulated data and on a real climate dataset are reported. In a synthetic dataset, the algorithm robustly detects the groundtruth variables and correctly infers the causal relationships between them. In a real climate dataset, the algorithm robustly detects two variables that correspond to the two known variations of the El Nino phenomenon. 
Benedikt Höltgen 🔗 


Amortized Causal Discovery: Learning to Infer Causal Graphs from TimeSeries Data
(
Poster
)
Standard causal discovery methods must ﬁt a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information – for instance, the dynamics describing the effects of causal relations – which is lost when following this approach. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from timeseries data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus makes use of the information that is shared. We demonstrate experimentally that this approach, implemented as a variational model, leads to signiﬁcant improvements in causal discovery performance, and show how it can be extended to perform well under hidden confounding. 
Sindy Löwe · David Madras · Richard Zemel · Max Welling 🔗 


Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders
(
Poster
)
The ability to answer causal questions is crucial in many domains, as causal inference allows one to understand the impact of interventions. In many applications, only a single intervention is possible at a given time. However, in certain important areas, multiple interventions are concurrently applied. Disentangling the effects of single interventions from jointly applied interventions is a challenging taskespecially as simultaneously applied interventions can interact. This problem is made harder still by unobserved confounders, which influence both interventions and outcome. We address this challenge by aiming to learn the effect of a singleintervention from both observational data and sets of interventions. We prove that this is not generally possible, but provide identification proofs demonstrating that it can be achieved in certain classes of additive noise modelseven in the presence of unobserved confounders. Importantly, we show how to incorporate observed covariates and learn heterogeneous treatment effects conditioned on them for singleinterventions. 
Olivier Jeunen · Ciaran GilliganLee · Rishabh Mehrotra · Mounia Lalmas 🔗 


Typing assumptions improve identification in causal discovery
(
Poster
)
Causal discovery from observational data is a challenging task to which an exact solution cannot always be identified. Under assumptions about the datagenerative process, the causal graph can often be identified up to an equivalence class. Proposing new realistic assumptions to circumscribe such equivalence classes is an active field of research. In this work, we propose a new set of assumptions that constrain possible causal relationships based on the nature of the variables. We thus introduce typed directed acyclic graphs, in which variable types are used to determine the validity of causal relationships. We demonstrate, both theoretically and empirically, that the proposed assumptions can result in significant gains in the identification of the causal graph. 
Philippe Brouillard · Perouz Taslakian · Alexandre Lacoste · Sébastien Lachapelle · Alexandre Drouin 🔗 


Prequential MDL for Causal Structure Learning with Neural Networks
(
Poster
)
Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas
of science and technology.
We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function
for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability
distributions between observed variables.
MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures
without relying on sparsity inducing priors or other regularizers which must be tuned.
Empirically we demonstrate competitive results on synthetic and realworld data. 
Jorg Bornschein · Silvia Chiappa · Alan Malek · Nan Rosemary Ke 🔗 


MANMCS: Data Generation for Benchmarking Causal Structure Learning from Mixed DiscreteContinuous and Nonlinear Data
(
Poster
)
In recent years, the growing interest in methods of causal structure learning (CSL) has been confronted with a lack of access to a welldefined ground truth within realworld scenarios to evaluate these methods. Existing synthetic benchmarks are limited in their scope. They are either restricted to a “static” lowdimensional data set or do not allow examining mixed discretecontinuous or nonlinear data. This work introduces the mixed additive noise model that provides a ground truth framework for generating observational data following various distribution models. Moreover, we present our reference implementation MANMCS that provides easy access and demonstrate how our framework can support researchers and practitioners. Further, we propose future research directions and possible extensions. 
Johannes Huegle · Christopher Hagedorn · Jonas Umland · Rainer Schlosser 🔗 


DiBS: Differentiable Bayesian Structure Learning
(
Poster
)
Bayesian structure learning allows inferring Bayesian network structure from data while reasoning about the epistemic uncertaintya key element towards enabling active causal discovery and designing interventions in real world systems. In this work, we propose a general, fully differentiable framework for Bayesian structure learning (DiBS) that operates in the continuous space of a latent probabilistic graph representation. Contrary to existing work, DiBS is agnostic to the form of the local conditional distributions and allows for joint posterior inference of both the graph structure and the conditional distribution parameters. This makes DiBS directly applicable to posterior inference of nonstandard Bayesian network models, e.g., with nonlinear dependencies encoded by neural networks. Building on recent advances in variational inference, we use DiBS to devise an efficient general purpose method for approximating posteriors over structural models. In evaluations on simulated and realworld data, our method significantly outperforms related approaches to joint posterior inference. 
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause 🔗 


Learning Neural Causal Models with Active Interventions
(
Poster
)
Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing scaling properties of neural networks have recently led to a surge of interest in differentiable neural networkbased methods for learning causal structures from data. So far differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active interventiontargeting mechanism which enables a quick identification of the underlying causal structure of the datagenerating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to realworld data. 
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke 🔗 


Identification of Latent Graphs: A Quantum Entropic Approach
(
Poster
)
Quantum causality is an emerging field of study that has the potential to greatly advance our understanding of quantum systems. In this paper, we put forth a new theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. For this purpose, we leverage the tradeoff between the entropy of hidden cause and conditional mutual information of observed variables to develop a scalable algorithmic approach for inferring causality in the presence of latent confounders (common causes) in quantum systems. As an application, we consider a system of three entangled qubits and transmit the second and third qubits over separate noisy quantum channels. In this model, we validate that the first qubit is a latent confounder and the common cause of the second and third qubits. In contrast, when two entangled qubits are prepared, and one of them is sent over a noisy channel, there is no common confounder. We also demonstrate that the proposed approach outperforms the results of classical causal inference for the Tubingen database when the variables are classical by exploiting quantum dependence between variables through density matrices rather than joint probability distributions. Thus, the proposed approach unifies classical and quantum causal inference in a principled way. 
Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob 🔗 


Reliable causal discovery based on mutual information supremum principle for finite datasets
(
Poster
)
The recent method, MIIC (Multivariate Informationbased Inductive Causation), combining constraintbased and informationtheoretic frameworks, has been shown to significantly improve causal discovery from purely observational data. Yet, a substantial loss in precision has remained between skeleton and oriented graph predictions for small datasets. Here, we propose and implement a simple modification, named conservative MIIC, based on a general mutual information supremum principle regularized for finite datasets. In practice, conservative MIIC rectifies the negative values of regularized (conditional) mutual information used by MIIC to identify (conditional) independence between discrete, continuous or mixedtype variables. This modification is shown to greatly enhance the reliability of predicted orientations, for all sample sizes, with only a small sensitivity loss compared to MIIC original orientation rules. Conservative MIIC is especially interesting to improve the reliability of causal discovery for reallife observational data applications. 
Vincent Cabeli · Honghao Li · Marcel da Câmara Ribeiro Dantas · Herve Isambert 🔗 


Scalable Causal Domain Adaptation
(
Poster
)
One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in lowdimensional and highdimensional settings. 
Mohammad Ali Javidian · Om Pandey · Pooyan Jamshidi 🔗 


Learning preventative and generative causal structures from point events in continuous time
(
Poster
)
Many previous accounts of causal structure induction have focused on atemporal contingency data while fewer have described learning on the basis of observations of events unfolding over time. How do people use temporal information to infer causal structures? Here we develop a computationallevel framework and propose several algorithmiclevel approximations to explain how people impute causal structures from continuoustime event sequences. We compare both normative and process accounts to participant behavior across two experiments. We consider structures combining both generative and preventative causal relationships in the presence of either regular or irregular background noise in the form of spontaneous activations. We find that 1) humans are robustly capable learners in this setting, successfully identifying a variety of ground truth structures but 2) diverging from our computationallevel account in ways we can explain with a more tractable simulation and summary statistics approximation scheme. We thus argue that human structure induction from temporal information relies on comparisons between observed patterns and expectations established via mental simulation. 
Tianwei Gong 🔗 


Building Objectbased Causal Programs for Humanlike Generalization
(
Poster
)
We present a novel task that measures how people generalize objects' causal powers based on observing a single (Experiment 1) or a few (Experiment 2) causal interactions between object pairs. We propose a computational modeling framework that can synthesize humanlike generalization patterns in our task setting, and sheds light on how people may navigate the compositional space of possible causal functions and categories efficiently. Our modeling framework combines a causal function generator that makes use of agent and recipient objects' features and relations, and a Bayesian nonparametric inference process to govern the degree of similaritybased generalization. Our model has a natural “resourcerational” variant that outperforms a naive Bayesian account in describing participants, in particular reproducing a generalizationorder effect and causal asymmetry observed in our behavioral experiments. We argue that this modeling framework provides a computationally plausible mechanism for real world causal generalization. 
Bonan Zhao · Chris Lucas 🔗 


On the Robustness of Causal Algorithmic Recourse
(
Poster
)
Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable outcomes made by automated decisionmaking systems. The individual then exerts time and effort to positively change their circumstances. Recourse recommendations should ideally be robust to reasonably small changes in the circumstances (similar individuals, updated classifier in light of larger datasets, and updated causal assumptions about the world). In this work, we formulate the robust recourse problem, derive bounds on the extra cost incurred by individuals seeking robust recourse subject to both linear and nonlinear assumptions, and discuss how to regulate this cost between the individual and the decisionmaker. 
Ricardo DominguezOlmedo · Amir Karimi · Bernhard Schölkopf 🔗 


Desiderata for Representation Learning: A Causal Perspective
(
Poster
)
Representation learning constructs lowdimensional representations to summarize essential features of highdimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be nonspurious, efficient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing nonspuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn nonspurious and disentangled representations from single observational datasets. 
Yixin Wang · Michael Jordan 🔗 


Scalable Variational Approaches for Bayesian Causal Discovery
(
Poster
)
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximumlikelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is nonidentifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linearGaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables lowvariance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximumlikelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes. 
Chris Cundy · Aditya Grover · Stefano Ermon 🔗 


Individual treatment effect estimation in the presence of unobserved confounding based on a fixed relative treatment effect
(
Poster
)
In healthcare, treatment effect estimates from randomized controlled trials are often reported on a relative scale, for instance as an oddsratio for binary outcomes. To weigh potential benefits and harms of treatment this oddsratio has te be translated to a difference in absolute risk, preferably on an individual patient level. Under the assumption that the relative treatment effect is fixed, it is possible that treatments have widely varying effects on an absolute risk scale. We demonstrate that if this relative treatment effect is known apriori, for example from randomized trials, it is possible to estimate the treatment effect on an absolute scale on an individualized basis, even in the presence of unobserved confounding. We use this assumption both on a standard logistic regression task and on a task with realworld medical images with simulated outcome data, using convolutional neural networks. On both tasks the method performs well. 
Wouter van Amsterdam · Rajesh Ranganath 🔗 


A Treebased Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources
(
Poster
)
Accurately estimating personalized treatment effects within a single study has been challenging due to the limited sample size. Here we propose a treebased model averaging approach to improve the estimation efficiency of conditional average treatment effects concerning the population of a target research site by leveraging models derived from potentially heterogeneous populations of other sites, but without them sharing individuallevel data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Under distributed data networks, we develop an efficient and interpretable treebased ensemble of personalized treatment effect estimators to join results across hospital sites, while actively modeling for the heterogeneity in data sources through site partitioning. The efficiency of this approach is demonstrated by a study of causal effects of oxygen saturation on hospital mortality and backed up by comprehensive numerical results. 
Xiaoqing Tan · Lu Tang 🔗 


Multiple Environments Can Reduce Indeterminacy in VAEs
(
Poster
)
Parameter and latent variable identifiability in variational autoencoders have received considerable attention recently, due to their empirical success in learning joint probabilities of complex data and their representations. Concurrently, modeling using multiple environments has been suggested for robust causal reasoning. We uncover additional theoretical benefits of multiple environments in the form of a strong identifiability result for a variational autoencoder model with latent covariate shift. We propose a novel learning algorithm that combines empirical Bayes and variational autoencoders, designed for latent variable identifiability without compromising representative power, using multiple environments as a crucial technical and practical tool. 
Quanhan (Johnny) Xi · Benjamin BloemReddy 🔗 


Using Embeddings to Estimate Peer Influence on Social Networks
(
Poster
)
We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbours. A main challenge to such estimation is that homophily  the tendency of connected units to share similar latent traits  acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment nonparametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key questions we address are: How should the causal effect be formalized? And, when can embedding methods yield causal identification? 
Irina Cristali · Victor Veitch 🔗 


Using NonLinear Causal Models to StudyAerosolCloud Interactions in the Southeast Pacific
(
Poster
)
Aerosolcloud interactions include a myriad of effects that all begin when aerosol enters a cloud and acts as cloud condensation nuclei (CCN).
An increase in CCN results in a decrease in the mean cloud droplet size (r$_{e}$).
The smaller droplet size leads to brighter, more expansive, and longer lasting clouds that reflect more incoming sunlight, thus cooling the earth.
Globally, aerosolcloud interactions cool the Earth, however the strength of the effect is heterogeneous over different meteorological regimes.
Understanding how aerosolcloud interactions evolve as a function of the local environment can help us better understand sources of error in our Earth system models, which currently fail to reproduce the observed relationships.
In this work we use recent nonlinear, causal machine learning methods to study the heterogeneous effects of aerosols on cloud droplet radius.

Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan WatsonParris · Yarin Gal · Philip Stier 🔗 


Synthesis of Reactive Programs with Structured Latent State
(
Poster
)
The human ability to efficiently discover causal theories of their environments from observations is a feat of nature that remains elusive in machines. In this work, we attempt to make progress on this frontier by formulating the challenge of causal mechanism discovery from observed data as one of program synthesis. We focus on the domain of timevarying, Atarilike 2D grid worlds, and represent causal models in this domain using a programming language called Autumn. Discovering the causal structure underlying a sequence of observations is equivalent to identifying the program in the Autumn language that generates the observations. We introduce a novel program synthesis algorithm, called AutumnSynth, that approaches this synthesis challenge by integrating standard methods of synthesizing functions with an automata synthesis approach, used to discover the model's latent state. We evaluate our method on a suite of Autumn programs designed to express the richness of the domain, which signals of the potential of our formulation. 
Ria Das · Zenna Tavares · Armando SolarLezama · Josh Tenenbaum 🔗 


Causal Inference Using Tractable Circuits
(
Poster
)
The aim of this paper is to discuss and draw attention to a recent result which shows that probabilistic inference in the presence of (unknown) causal mechanisms can be tractable for models that have traditionally been viewed as intractable. This result was reported recently in (Darwiche, ECAI 2020) to facilitate modelbased supervised learning but it can be interpreted in a causality context as follows. One can compile a nonparametric causal graph into an arithmetic circuit that supports inference in time linear in the circuit size. The circuit is nonparametric so it can be used to estimate parameters from data and to further reason (in linear time) about the causal graph parametrized by these estimates. Moreover, the circuit size can sometimes be independent of the causal graph treewidth, leading to tractable inference on models that have been deemed intractable. This has been enabled by a new technique that can exploit causal mechanisms computationally but without needing to know their identities (the classical setup in causal inference). Our goal is to provide a causality oriented exposure to these new results and to speculate on how they may potentially contribute to more scalable and versatile causal inference. 
Adnan Darwiche 🔗 


Causal ExpectationMaximisation
(
Poster
)
Structural causal models are the fundamental modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which represent the most expressive level of causal inference. But they most often contain latent variables that limit their application to special settings. In this paper we introduce the causal EM algorithm that aims at reconstructing the uncertainty about the latent variables; based on this, causal inference can approximately be solved via standard algorithms for Bayesian networks. The result is a general method to solve causal inference queries, be they identifiable or not (in which case we deliver bounds), on semiMarkovian structural causal models with categorical variables. We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. We show that causal inference is NPhard also in models characterised by polytreeshaped graphs; this supports developing approximate approaches to causal inference. Finally, we argue that there is possibly an overlooked issue in computing counterfactual bounds without knowledge of the structural equations that might negatively impact on known results. 
Marco Zaffalon · Alessandro Antonucci · Rafael Cabañas 🔗 