Timezone: »
Machine Learning has been extremely successful throughout many critical areas, including computer vision, natural language processing, and gameplaying. Still, a growing segment of the machine learning community recognizes that there are still fundamental pieces missing from the AI puzzle, among them causal inference.
This recognition comes from the observation that even though causality is a central component found throughout the sciences, engineering, and many other aspects of human cognition, explicit reference to causal relationships is largely missing in current learning systems. This entails a new goal of integrating causal inference and machine learning capabilities into the next generation of intelligent systems, thus paving the way towards higher levels of intelligence and humancentric AI. The synergy goes in both directions; causal inference benefitting from machine learning and the other way around. Current machine learning systems lack the ability to leverage the invariances imprinted by the underlying causal mechanisms towards reasoning about generalizability, explainability, interpretability, and robustness. Current causal inference methods, on the other hand, lack the ability to scale up to highdimensional settings, where current machine learning systems excel.
The goal of this workshop is to bring together researchers from both camps to initiate principled discussions about the integration of causal reasoning and machine learning perspectives to help tackle the challenging AI tasks of the coming decades. We welcome researchers from all relevant disciplines, including but not limited to computer science, cognitive science, robotics, mathematics, statistics, physics, and philosophy.
Mon 7:00 a.m.  7:10 a.m.

Intro
SlidesLive Video » 
🔗 
Mon 7:10 a.m.  7:30 a.m.

Uri Shalit  Calibration, outofdistribution generalization and a path towards causal representations
(Invited Talk)
SlidesLive Video » 
Uri Shalit 🔗 
Mon 7:30 a.m.  7:50 a.m.

Julius von Kügelgen  Independent mechanism analysis, a new concept?
(Invited Talk)
SlidesLive Video » Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are included in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation. Reference: https://arxiv.org/abs/2106.05200 (accepted at: NeurIPS 2021) 
Julius von Kügelgen 🔗 
Mon 7:50 a.m.  8:10 a.m.

David Blei  On the Assumptions of Synthetic Control Methods
(Invited Talk)
SlidesLive Video » 
David Blei 🔗 
Mon 8:10 a.m.  8:25 a.m.

Session 1: Q&A
(Q&A)
SlidesLive Video » 
🔗 
Mon 8:30 a.m.  8:50 a.m.

Ricardo Silva  The Road to Causal Programming
(Invited Talk)
SlidesLive Video » 
Ricardo Silva 🔗 
Mon 8:50 a.m.  9:10 a.m.

Aapo Hyvarinen  Causal discovery by generative modelling
(Invited Talk)
SlidesLive Video » There is a deep connection between causal discovery and generative models, such as factor analysis, independent component analysis, and various unsupervised deep learning models. Two key concepts that emerge are identifiability and nonstationarity. In this talk, I will review this research, providing some historical perspectives as well as open questions for future research. 
Aapo Hyvarinen 🔗 
Mon 9:10 a.m.  9:35 a.m.

Tobias Gerstenberg  Going beyond the here and now: Counterfactual simulation in human cognition
(Invited Talk)
SlidesLive Video » As humans, we spend much of our time going beyond the here and now. We dwell on the past, long for the future, and ponder how things could have turned out differently. In this talk, I will argue that people's knowledge of the world is organized around causally structured mental models, and that much of human thought can be understood as cognitive operations over these mental models. Specifically, I will highlight the pervasiveness of counterfactual thinking in human cognition. Counterfactuals are critical for how people make causal judgments, how they explain what happened, and how they hold others responsible for their actions. 
Tobias Gerstenberg 🔗 
Mon 9:35 a.m.  9:45 a.m.

Session 2: Q&A
(Q&A)
SlidesLive Video » 
🔗 
Mon 9:45 a.m.  10:45 a.m.

Poster Session link »  🔗 
Mon 10:45 a.m.  11:05 a.m.

Thomas Icard  A (topo)logical perspective on causal inference
(Invited Talk)
SlidesLive Video » 
Thomas Icard 🔗 
Mon 11:05 a.m.  11:25 a.m.

Caroline Uhler: TBA
(Invited Talk)
SlidesLive Video » 
Caroline Uhler 🔗 
Mon 11:25 a.m.  11:45 a.m.

Rosemary Ke  From "What" to "Why": towards causal learning
(Invited Talk)
SlidesLive Video » 
Nan Rosemary Ke 🔗 
Mon 11:45 a.m.  12:00 p.m.

Session 3: Q&A
(Q&A)
SlidesLive Video » 
🔗 
Mon 12:00 p.m.  12:45 p.m.

Judea Pearl  The logic of Causal Inference
(Keynote Speaker)
SlidesLive Video » 
🔗 
Mon 12:45 p.m.  1:00 p.m.

Discussion Panel

🔗 
Mon 1:00 p.m.  1:15 p.m.

Zaffalon, Antonucci, Cabañas  Causal ExpectationMaximisation
(Contributed Talk)
SlidesLive Video » Structural causal models are the basic modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which are at the top rung of the ladder of causation. But they often contain latent variables that limit their application to special settings. This appears to be a consequence of the fact, proven in this paper, that causal inference is NPhard even in models characterised by polytreeshaped graphs. To deal with such a hardness, we introduce the causal EM algorithm. Its primary aim is to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. Counterfactual inference is then addressed via standard algorithms for Bayesian networks. The result is a general method to approximately compute counterfactuals, be they identifiable or not (in which case we deliver bounds). We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. These results lead us finally to argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations. 
Marco Zaffalon · Alessandro Antonucci · Rafael Cabañas 🔗 
Mon 1:15 p.m.  1:30 p.m.

Dominguez Olmedo, Karimi, Schölkopf  On the Adversarial Robustness of Causal Algorithmic Recourse
(Contributed Talk)
SlidesLive Video » Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable outcomes made by automated decisionmaking systems. The individual then exerts time and effort to positively change their circumstances. Recourse recommendations should ideally be robust to reasonably small changes in the circumstances of the individual seeking recourse. In this work, we formulate the adversarially robust recourse problem and show that methods that offer minimally costly recourse fail to be robust. We restrict ourselves to linear classifiers, and show that the adversarially robust recourse problem reduces to the standard recourse problem for some modified classifier with a shifted decision boundary. Finally, we derive bounds on the extra cost incurred by individuals seeking robust recourse, and discuss how to regulate this cost between the individual and the decisionmaker. 
Ricardo DominguezOlmedo · Amir Karimi · Bernhard Schölkopf 🔗 
Mon 1:30 p.m.  1:45 p.m.

Javidian, Pandey, Jamshidi  Scalable Causal Domain Adaptation
(Contributed Talk)
SlidesLive Video » One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in lowdimensional and highdimensional settings. 
Mohammad Ali Javidian · Om Pandey · Pooyan Jamshidi 🔗 
Mon 1:45 p.m.  2:00 p.m.

Cundy, Grover, Ermon  BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery
(Contributed Talk)
SlidesLive Video » A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximumlikelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is nonidentifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linearGaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables lowvariance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximumlikelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes. 
Chris Cundy · Aditya Grover · Stefano Ermon 🔗 
Mon 2:00 p.m.  2:20 p.m.

Alison Gopnik  Casual Learning in Children and Computational Models
(Invited Talk)
SlidesLive Video » Very young children routinely solve causal problems that are still very challenging for machine learning systems. I will outline several exciting recent lines of work looking at young children’s causal reasoning and learning and comparing it to learning in various computational models. This includes work on the selection of relevant test variables, learning abstract and analogical relationships, and, most importantly, techniques for active learning and causal exploration. 
Alison Gopnik 🔗 
Mon 2:20 p.m.  2:40 p.m.

Adèle Ribeiro  Effect Identification in Cluster Causal Diagrams
(Invited Talk)
SlidesLive Video » A pervasive task found throughout the empirical sciences is to determine the effect of interventions from observational data. It is wellunderstood that assumptions are necessary to perform such causal inferences, an idea popularized through Cartwright’s motto: "no causesin, no causesout." One way of articulating these assumptions is through the use of causal diagrams, which are a special type of graphical model with causal semantics [Pearl, 2000]. The graphical approach has been applied successfully in many settings, but there are still challenges to its use, particularly in complex, highdimensional domains. In this talk, I will introduce cluster causal diagrams (CDAGs), a novel causal graphical model that allows for the partial specification of the relationships among variables. CDAGs provide a simple yet effective way to partially abstract a grouping (cluster) of variables among which causal relationships are not fully understood while preserving consistency with the underlying causal system and the validity of causal identification tools. Reference: https://causalai.net/r77.pdf 
Adèle Ribeiro 🔗 
Mon 2:40 p.m.  3:00 p.m.

Victor Chernozhukov  Omitted Confounder Bias Bounds for Machine Learned Causal Models
(Invited Talk)
SlidesLive Video » 
Victor Chernozhukov 🔗 
Mon 3:00 p.m.  3:15 p.m.

Session 4: Q&A
(Q&A)
SlidesLive Video » 
🔗 
Mon 3:15 p.m.  3:30 p.m.

Closing Remarks

🔗 


Unsupervised Causal Binary Concepts Discovery with VAE for Blackbox Model Explanation
(Poster)
We aim to explain a blackbox classifier with the form: `data X is classified as class Y because X \textit{has} A, B and \textit{does not have} C' in which A, B, and C are highlevel concepts. The challenge is that we have to discover in an unsupervised manner a set of concepts, i.e., A, B and C, that is useful for the explaining the classifier. We first introduce a structural generative model that is suitable to express and discover such concepts. We then propose a learning process that simultaneously learns the data distribution and encourages certain concepts to have a large causal influence on the classifier output. Our method also allows easy integration of user's prior knowledge to induce high interpretability of concepts. Using multiple datasets, we demonstrate that our method can discover useful binary concepts for explanation. 
Thien Tran · Kazuto Fukuchi · Youhei Akimoto · Jun Sakuma 🔗 


Encoding Causal Macrovariables
(Poster)
In many scientific disciplines, coarsegrained causal models are used to explain and predict the dynamics of more finegrained systems. Naturally, such models require appropriate macrovariables. Automated procedures to detect suitable variables would be useful to leverage increasingly available highdimensional observational datasets. This work introduces a novel algorithmic approach that is inspired by a new characterisation of causal macrovariables as information bottlenecks between microstates. Its general form can be adapted to address individual needs of different scientific goals. After a further transformation step, the causal relationships between learned variables can be investigated through additive noise models. Experiments on both simulated data and on a real climate dataset are reported. In a synthetic dataset, the algorithm robustly detects the groundtruth variables and correctly infers the causal relationships between them. In a real climate dataset, the algorithm robustly detects two variables that correspond to the two known variations of the El Nino phenomenon. 
Benedikt Höltgen 🔗 


Amortized Causal Discovery: Learning to Infer Causal Graphs from TimeSeries Data
(Poster)
Standard causal discovery methods must ﬁt a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information – for instance, the dynamics describing the effects of causal relations – which is lost when following this approach. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from timeseries data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus makes use of the information that is shared. We demonstrate experimentally that this approach, implemented as a variational model, leads to signiﬁcant improvements in causal discovery performance, and show how it can be extended to perform well under hidden confounding. 
Sindy Löwe · David Madras · Richard Zemel · Max Welling 🔗 


Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders
(Poster)
The ability to answer causal questions is crucial in many domains, as causal inference allows one to understand the impact of interventions. In many applications, only a single intervention is possible at a given time. However, in certain important areas, multiple interventions are concurrently applied. Disentangling the effects of single interventions from jointly applied interventions is a challenging taskespecially as simultaneously applied interventions can interact. This problem is made harder still by unobserved confounders, which influence both interventions and outcome. We address this challenge by aiming to learn the effect of a singleintervention from both observational data and sets of interventions. We prove that this is not generally possible, but provide identification proofs demonstrating that it can be achieved in certain classes of additive noise modelseven in the presence of unobserved confounders. Importantly, we show how to incorporate observed covariates and learn heterogeneous treatment effects conditioned on them for singleinterventions. 
Olivier Jeunen · Ciaran GilliganLee · Rishabh Mehrotra · Mounia Lalmas 🔗 


Typing assumptions improve identification in causal discovery
(Poster)
Causal discovery from observational data is a challenging task to which an exact solution cannot always be identified. Under assumptions about the datagenerative process, the causal graph can often be identified up to an equivalence class. Proposing new realistic assumptions to circumscribe such equivalence classes is an active field of research. In this work, we propose a new set of assumptions that constrain possible causal relationships based on the nature of the variables. We thus introduce typed directed acyclic graphs, in which variable types are used to determine the validity of causal relationships. We demonstrate, both theoretically and empirically, that the proposed assumptions can result in significant gains in the identification of the causal graph. 
Philippe Brouillard · Perouz Taslakian · Alexandre Lacoste · Sébastien Lachapelle · Alexandre Drouin 🔗 


Prequential MDL for Causal Structure Learning with Neural Networks
(Poster)
Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas
of science and technology.
We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function
for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability
distributions between observed variables.
MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures
without relying on sparsity inducing priors or other regularizers which must be tuned.
Empirically we demonstrate competitive results on synthetic and realworld data. 
Jorg Bornschein · Silvia Chiappa · Alan Malek · Nan Rosemary Ke 🔗 


MANMCS: Data Generation for Benchmarking Causal Structure Learning from Mixed DiscreteContinuous and Nonlinear Data
(Poster)
In recent years, the growing interest in methods of causal structure learning (CSL) has been confronted with a lack of access to a welldefined ground truth within realworld scenarios to evaluate these methods. Existing synthetic benchmarks are limited in their scope. They are either restricted to a “static” lowdimensional data set or do not allow examining mixed discretecontinuous or nonlinear data. This work introduces the mixed additive noise model that provides a ground truth framework for generating observational data following various distribution models. Moreover, we present our reference implementation MANMCS that provides easy access and demonstrate how our framework can support researchers and practitioners. Further, we propose future research directions and possible extensions. 
Johannes Huegle · Christopher Hagedorn · Jonas Umland · Rainer Schlosser 🔗 


DiBS: Differentiable Bayesian Structure Learning
(Poster)
Bayesian structure learning allows inferring Bayesian network structure from data while reasoning about the epistemic uncertaintya key element towards enabling active causal discovery and designing interventions in real world systems. In this work, we propose a general, fully differentiable framework for Bayesian structure learning (DiBS) that operates in the continuous space of a latent probabilistic graph representation. Contrary to existing work, DiBS is agnostic to the form of the local conditional distributions and allows for joint posterior inference of both the graph structure and the conditional distribution parameters. This makes DiBS directly applicable to posterior inference of nonstandard Bayesian network models, e.g., with nonlinear dependencies encoded by neural networks. Building on recent advances in variational inference, we use DiBS to devise an efficient general purpose method for approximating posteriors over structural models. In evaluations on simulated and realworld data, our method significantly outperforms related approaches to joint posterior inference. 
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause 🔗 


Learning Neural Causal Models with Active Interventions
(Poster)
Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing scaling properties of neural networks have recently led to a surge of interest in differentiable neural networkbased methods for learning causal structures from data. So far differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active interventiontargeting mechanism which enables a quick identification of the underlying causal structure of the datagenerating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to realworld data. 
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal ALIAS PARTH GOYAL · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke 🔗 


Identification of Latent Graphs: A Quantum Entropic Approach
(Poster)
Quantum causality is an emerging field of study that has the potential to greatly advance our understanding of quantum systems. In this paper, we put forth a new theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. For this purpose, we leverage the tradeoff between the entropy of hidden cause and conditional mutual information of observed variables to develop a scalable algorithmic approach for inferring causality in the presence of latent confounders (common causes) in quantum systems. As an application, we consider a system of three entangled qubits and transmit the second and third qubits over separate noisy quantum channels. In this model, we validate that the first qubit is a latent confounder and the common cause of the second and third qubits. In contrast, when two entangled qubits are prepared, and one of them is sent over a noisy channel, there is no common confounder. We also demonstrate that the proposed approach outperforms the results of classical causal inference for the Tubingen database when the variables are classical by exploiting quantum dependence between variables through density matrices rather than joint probability distributions. Thus, the proposed approach unifies classical and quantum causal inference in a principled way. 
Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob 🔗 


Reliable causal discovery based on mutual information supremum principle for finite datasets
(Poster)
The recent method, MIIC (Multivariate Informationbased Inductive Causation), combining constraintbased and informationtheoretic frameworks, has been shown to significantly improve causal discovery from purely observational data. Yet, a substantial loss in precision has remained between skeleton and oriented graph predictions for small datasets. Here, we propose and implement a simple modification, named conservative MIIC, based on a general mutual information supremum principle regularized for finite datasets. In practice, conservative MIIC rectifies the negative values of regularized (conditional) mutual information used by MIIC to identify (conditional) independence between discrete, continuous or mixedtype variables. This modification is shown to greatly enhance the reliability of predicted orientations, for all sample sizes, with only a small sensitivity loss compared to MIIC original orientation rules. Conservative MIIC is especially interesting to improve the reliability of causal discovery for reallife observational data applications. 
Vincent Cabeli · Honghao Li · Marcel da Câmara Ribeiro Dantas · Herve Isambert 🔗 


Scalable Causal Domain Adaptation
(Poster)
One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in lowdimensional and highdimensional settings. 
Mohammad Ali Javidian · Om Pandey · Pooyan Jamshidi 🔗 


Learning preventative and generative causal structures from point events in continuous time
(Poster)
Many previous accounts of causal structure induction have focused on atemporal contingency data while fewer have described learning on the basis of observations of events unfolding over time. How do people use temporal information to infer causal structures? Here we develop a computationallevel framework and propose several algorithmiclevel approximations to explain how people impute causal structures from continuoustime event sequences. We compare both normative and process accounts to participant behavior across two experiments. We consider structures combining both generative and preventative causal relationships in the presence of either regular or irregular background noise in the form of spontaneous activations. We find that 1) humans are robustly capable learners in this setting, successfully identifying a variety of ground truth structures but 2) diverging from our computationallevel account in ways we can explain with a more tractable simulation and summary statistics approximation scheme. We thus argue that human structure induction from temporal information relies on comparisons between observed patterns and expectations established via mental simulation. 
Tia Gong 🔗 


Building Objectbased Causal Programs for Humanlike Generalization
(Poster)
We present a novel task that measures how people generalize objects' causal powers based on observing a single (Experiment 1) or a few (Experiment 2) causal interactions between object pairs. We propose a computational modeling framework that can synthesize humanlike generalization patterns in our task setting, and sheds light on how people may navigate the compositional space of possible causal functions and categories efficiently. Our modeling framework combines a causal function generator that makes use of agent and recipient objects' features and relations, and a Bayesian nonparametric inference process to govern the degree of similaritybased generalization. Our model has a natural “resourcerational” variant that outperforms a naive Bayesian account in describing participants, in particular reproducing a generalizationorder effect and causal asymmetry observed in our behavioral experiments. We argue that this modeling framework provides a computationally plausible mechanism for real world causal generalization. 
Bonan Zhao · Chris Lucas 🔗 


On the Robustness of Causal Algorithmic Recourse
(Poster)
Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable outcomes made by automated decisionmaking systems. The individual then exerts time and effort to positively change their circumstances. Recourse recommendations should ideally be robust to reasonably small changes in the circumstances (similar individuals, updated classifier in light of larger datasets, and updated causal assumptions about the world). In this work, we formulate the robust recourse problem, derive bounds on the extra cost incurred by individuals seeking robust recourse subject to both linear and nonlinear assumptions, and discuss how to regulate this cost between the individual and the decisionmaker. 
Ricardo DominguezOlmedo · Amir Karimi · Bernhard Schölkopf 🔗 


Desiderata for Representation Learning: A Causal Perspective
(Poster)
Representation learning constructs lowdimensional representations to summarize essential features of highdimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be nonspurious, efficient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing nonspuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn nonspurious and disentangled representations from single observational datasets. 
Yixin Wang · Michael Jordan 🔗 


Scalable Variational Approaches for Bayesian Causal Discovery
(Poster)
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximumlikelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is nonidentifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linearGaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables lowvariance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximumlikelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes. 
Chris Cundy · Aditya Grover · Stefano Ermon 🔗 


Individual treatment effect estimation in the presence of unobserved confounding based on a fixed relative treatment effect
(Poster)
In healthcare, treatment effect estimates from randomized controlled trials are often reported on a relative scale, for instance as an oddsratio for binary outcomes. To weigh potential benefits and harms of treatment this oddsratio has te be translated to a difference in absolute risk, preferably on an individual patient level. Under the assumption that the relative treatment effect is fixed, it is possible that treatments have widely varying effects on an absolute risk scale. We demonstrate that if this relative treatment effect is known apriori, for example from randomized trials, it is possible to estimate the treatment effect on an absolute scale on an individualized basis, even in the presence of unobserved confounding. We use this assumption both on a standard logistic regression task and on a task with realworld medical images with simulated outcome data, using convolutional neural networks. On both tasks the method performs well. 
Wouter van Amsterdam · Rajesh Ranganath 🔗 


A Treebased Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources
(Poster)
Accurately estimating personalized treatment effects within a single study has been challenging due to the limited sample size. Here we propose a treebased model averaging approach to improve the estimation efficiency of conditional average treatment effects concerning the population of a target research site by leveraging models derived from potentially heterogeneous populations of other sites, but without them sharing individuallevel data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Under distributed data networks, we develop an efficient and interpretable treebased ensemble of personalized treatment effect estimators to join results across hospital sites, while actively modeling for the heterogeneity in data sources through site partitioning. The efficiency of this approach is demonstrated by a study of causal effects of oxygen saturation on hospital mortality and backed up by comprehensive numerical results. 
Xiaoqing Tan · Lu Tang 🔗 


Multiple Environments Can Reduce Indeterminacy in VAEs
(Poster)
Parameter and latent variable identifiability in variational autoencoders have received considerable attention recently, due to their empirical success in learning joint probabilities of complex data and their representations. Concurrently, modeling using multiple environments has been suggested for robust causal reasoning. We uncover additional theoretical benefits of multiple environments in the form of a strong identifiability result for a variational autoencoder model with latent covariate shift. We propose a novel learning algorithm that combines empirical Bayes and variational autoencoders, designed for latent variable identifiability without compromising representative power, using multiple environments as a crucial technical and practical tool. 
Johnny Xi · Benjamin BloemReddy 🔗 


Using Embeddings to Estimate Peer Influence on Social Networks
(Poster)
We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbours. A main challenge to such estimation is that homophily  the tendency of connected units to share similar latent traits  acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment nonparametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key questions we address are: How should the causal effect be formalized? And, when can embedding methods yield causal identification? 
Irina Cristali · Victor Veitch 🔗 


Using NonLinear Causal Models to StudyAerosolCloud Interactions in the Southeast Pacific
(Poster)
Aerosolcloud interactions include a myriad of effects that all begin when aerosol enters a cloud and acts as cloud condensation nuclei (CCN).
An increase in CCN results in a decrease in the mean cloud droplet size (r$_{e}$).
The smaller droplet size leads to brighter, more expansive, and longer lasting clouds that reflect more incoming sunlight, thus cooling the earth.
Globally, aerosolcloud interactions cool the Earth, however the strength of the effect is heterogeneous over different meteorological regimes.
Understanding how aerosolcloud interactions evolve as a function of the local environment can help us better understand sources of error in our Earth system models, which currently fail to reproduce the observed relationships.
In this work we use recent nonlinear, causal machine learning methods to study the heterogeneous effects of aerosols on cloud droplet radius.

Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan WatsonParris · Yarin Gal · Philip Stier 🔗 


Synthesis of Reactive Programs with Structured Latent State
(Poster)
The human ability to efficiently discover causal theories of their environments from observations is a feat of nature that remains elusive in machines. In this work, we attempt to make progress on this frontier by formulating the challenge of causal mechanism discovery from observed data as one of program synthesis. We focus on the domain of timevarying, Atarilike 2D grid worlds, and represent causal models in this domain using a programming language called Autumn. Discovering the causal structure underlying a sequence of observations is equivalent to identifying the program in the Autumn language that generates the observations. We introduce a novel program synthesis algorithm, called AutumnSynth, that approaches this synthesis challenge by integrating standard methods of synthesizing functions with an automata synthesis approach, used to discover the model's latent state. We evaluate our method on a suite of Autumn programs designed to express the richness of the domain, which signals of the potential of our formulation. 
Ria Das · Zenna Tavares · Armando SolarLezama · Josh Tenenbaum 🔗 


Causal Inference Using Tractable Circuits
(Poster)
The aim of this paper is to discuss and draw attention to a recent result which shows that probabilistic inference in the presence of (unknown) causal mechanisms can be tractable for models that have traditionally been viewed as intractable. This result was reported recently in (Darwiche, ECAI 2020) to facilitate modelbased supervised learning but it can be interpreted in a causality context as follows. One can compile a nonparametric causal graph into an arithmetic circuit that supports inference in time linear in the circuit size. The circuit is nonparametric so it can be used to estimate parameters from data and to further reason (in linear time) about the causal graph parametrized by these estimates. Moreover, the circuit size can sometimes be independent of the causal graph treewidth, leading to tractable inference on models that have been deemed intractable. This has been enabled by a new technique that can exploit causal mechanisms computationally but without needing to know their identities (the classical setup in causal inference). Our goal is to provide a causality oriented exposure to these new results and to speculate on how they may potentially contribute to more scalable and versatile causal inference. 
Adnan Darwiche 🔗 


Causal ExpectationMaximisation
(Poster)
Structural causal models are the fundamental modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which represent the most expressive level of causal inference. But they most often contain latent variables that limit their application to special settings. In this paper we introduce the causal EM algorithm that aims at reconstructing the uncertainty about the latent variables; based on this, causal inference can approximately be solved via standard algorithms for Bayesian networks. The result is a general method to solve causal inference queries, be they identifiable or not (in which case we deliver bounds), on semiMarkovian structural causal models with categorical variables. We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. We show that causal inference is NPhard also in models characterised by polytreeshaped graphs; this supports developing approximate approaches to causal inference. Finally, we argue that there is possibly an overlooked issue in computing counterfactual bounds without knowledge of the structural equations that might negatively impact on known results. 
Marco Zaffalon · Alessandro Antonucci · Rafael Cabañas 🔗 
Author Information
Elias Bareinboim (Columbia University)
Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)
Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.
Terrence Sejnowski (Salk Institute)
Yoshua Bengio (Mila / U. Montreal)
Yoshua Bengio is Full Professor in the computer science and operations research department at U. Montreal, scientific director and founder of Mila and of IVADO, Turing Award 2018 recipient, Canada Research Chair in Statistical Learning Algorithms, as well as a Canada AI CIFAR Chair. He pioneered deep learning and has been getting the most citations per day in 2018 among all computer scientists, worldwide. He is an officer of the Order of Canada, member of the Royal Society of Canada, was awarded the Killam Prize, the MarieVictorin Prize and the RadioCanada Scientist of the year in 2017, and he is a member of the NeurIPS advisory board and cofounder of the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains. His goal is to contribute to uncover the principles giving rise to intelligence through learning, as well as favour the development of AI for the benefit of all.
Judea Pearl (UCLA)
Judea Pearl is a professor of computer science and statistics at UCLA. He is a graduate of the Technion, Israel, and has joined the faculty of UCLA in 1970, where he conducts research in artificial intelligence, causal inference and philosophy of science. Pearl has authored three books: Heuristics (1984), Probabilistic Reasoning (1988), and Causality (2000;2009), the latter won the Lakatos Prize from the London School of Economics. He is a member of the National Academy of Engineering, the American Academy of Arts and Sciences, and a Fellow of the IEEE, AAAI and the Cognitive Science Society. Pearl received the 2008 Benjamin Franklin Medal from the Franklin Institute and the 2011 Rumelhart Prize from the Cognitive Science Society. In 2012, he received the Technion's Harvey Prize and the ACM Alan M. Turing Award.
More from the Same Authors

2021 Spotlight: Invariance Principle Meets Information Bottleneck for OutofDistribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · JeanChristophe GagnonAudet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish 
2021 Spotlight: Double Machine Learning Density Estimation for Local Treatment Effects with Instruments »
Yonghan Jung · Jin Tian · Elias Bareinboim 
2021 Spotlight: Iterative Teaching by Label Synthesis »
Weiyang Liu · Zhen Liu · Hanchen Wang · Liam Paull · Bernhard Schölkopf · Adrian Weller 
2021 Spotlight: DiBS: Differentiable Bayesian Structure Learning »
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause 
2021 : Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning »
Nan Rosemary Ke · Aniket Didolkar · Sarthak Mittal · Anirudh Goyal · Guillaume Lajoie · Stefan Bauer · Danilo Jimenez Rezende · Yoshua Bengio · Chris Pal · Michael Mozer 
2021 : Distributionally robust chance constrained programs using maximum mean discrepancy »
Yassine Nemmour · Bernhard Schölkopf · JiaJie Zhu 
2021 : LongTerm Credit Assignment via Modelbased Temporal Shortcuts »
Michel Ma · Pierluca D'Oro · Yoshua Bengio · PierreLuc Bacon 
2021 : A ConsciousnessInspired Planning Agent for ModelBased Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio 
2021 : Effect of diversity in MetaLearning »
Ramnath Kumar · Tristan Deleu · Yoshua Bengio 
2021 : Explainable medical image analysis by leveraging humaninterpretable features through mutual information minimization »
Erick M Cobos · Thomas Kuestner · Bernhard Schölkopf · Sergios Gatidis 
2021 : Amortized Bayesian inference of gravitational waves with normalizing flows »
Maximilian Dax · Stephen Green · Jakob Macke · Bernhard Schölkopf 
2021 : DiBS: Differentiable Bayesian Structure Learning »
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause 
2021 : Learning Neural Causal Models with Active Interventions »
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke 
2021 : On the Robustness of Causal Algorithmic Recourse »
Ricardo DominguezOlmedo · Amir Karimi · Bernhard Schölkopf 
2021 : MultiDomain Balanced Sampling Improves OutofDistribution Generalization of Chest Xray Pathology Prediction Models »
Enoch Tetteh · David Krueger · Joseph Paul Cohen · Yoshua Bengio 
2022 Workshop: Tackling Climate Change with Machine Learning »
Peetak Mitra · Maria João Sousa · Mark Roth · Jan Drgona · Emma Strubell · Yoshua Bengio 
2022 Workshop: AI for Science: Progress and Promises »
Yi Ding · Yuanqi Du · Tianfan Fu · Hanchen Wang · Anima Anandkumar · Yoshua Bengio · Anthony Gitter · Carla Gomes · Aviv Regev · Max Welling · Marinka Zitnik 
2022 Competition: Real Robot Challenge III  Learning Dexterous Manipulation from Offline Data in the Real World »
Georg Martius · Nico Gürtler · Cansu Sancaktar · Sebastian Blaes · Pavel Kolev · Stefan Bauer · Manuel Wuethrich · Markus Wulfmeier · Martin Riedmiller · Arthur Allshire · Annika Buchholz · Bernhard Schölkopf 
2022 Poster: Exploring the Latent Space of Autoencoders with Interventional Assays »
Felix Leeb · Stefan Bauer · Michel Besserve · Bernhard Schölkopf 
2022 Poster: Amortized Inference for Causal Structure Learning »
Lars Lorch · Scott Sussex · Jonas Rothfuss · Andreas Krause · Bernhard Schölkopf 
2022 Poster: MAgNet: Mesh Agnostic Neural PDE Solver »
Oussama Boussif · Yoshua Bengio · Loubna Benabbou · Dan Assouline 
2022 Poster: Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence Learning »
Aniket Didolkar · Kshitij Gupta · Anirudh Goyal · Alex Lamb · Nan Rosemary Ke · Yoshua Bengio 
2022 Poster: Embrace the Gap: VAEs Perform Independent Mechanism Analysis »
Patrik Reizinger · Luigi Gresele · Jack Brady · Julius von Kügelgen · Dominik Zietlow · Bernhard Schölkopf · Georg Martius · Wieland Brendel · Michel Besserve 
2022 Poster: Controlled Sparsity via Constrained Optimization or: How I Learned to Stop Tuning Penalties and Love Constraints »
Jose GallegoPosada · Juan Ramirez · Akram Erraqabi · Yoshua Bengio · Simon LacosteJulien 
2022 Poster: Trajectory balance: Improved credit assignment in GFlowNets »
Nikolay Malkin · Moksh Jain · Emmanuel Bengio · Chen Sun · Yoshua Bengio 
2022 Poster: Neural Attentive Circuits »
Martin Weiss · Nasim Rahaman · Francesco Locatello · Chris Pal · Yoshua Bengio · Bernhard Schölkopf · Li Erran Li · Nicolas Ballas 
2022 Poster: Assaying OutOfDistribution Generalization in Transfer Learning »
Florian Wenzel · Andrea Dittadi · Peter Gehler · CarlJohann SimonGabriel · Max Horn · Dominik Zietlow · David Kernert · Chris Russell · Thomas Brox · Bernt Schiele · Bernhard Schölkopf · Francesco Locatello 
2022 Poster: Function Classes for Identifiable Nonlinear Independent Component Analysis »
Simon Buchholz · Michel Besserve · Bernhard Schölkopf 
2022 Poster: Finding and Listing Frontdoor Adjustment Sets »
Hyunchai Jeong · Jin Tian · Elias Bareinboim 
2022 Poster: Online Reinforcement Learning for Mixed Policy Scopes »
Junzhe Zhang · Elias Bareinboim 
2022 Poster: Discrete Compositional Representations as an Abstraction for Goal Conditioned Reinforcement Learning »
Riashat Islam · Hongyu Zang · Anirudh Goyal · Alex Lamb · Kenji Kawaguchi · Xin Li · Romain Laroche · Yoshua Bengio · Remi Tachet des Combes 
2022 Poster: Probable Domain Generalization via Quantile Risk Minimization »
Cian Eastwood · Alexander Robey · Shashank Singh · Julius von Kügelgen · Hamed Hassani · George J. Pappas · Bernhard Schölkopf 
2022 Poster: Interventions, Where and How? Bayesian Active Causal Discovery at Scale »
Panagiotis Tigas · Yashas Annadani · Andrew Jesson · Bernhard Schölkopf · Yarin Gal · Stefan Bauer 
2022 Poster: AutoML TwoSample Test »
Jonas Kübler · Vincent Stimper · Simon Buchholz · Krikamol Muandet · Bernhard Schölkopf 
2022 Poster: Weakly Supervised Representation Learning with Sparse Perturbations »
Kartik Ahuja · Jason Hartford · Yoshua Bengio 
2022 Poster: Causal Discovery in Heterogeneous Environments Under the Sparse Mechanism Shift Hypothesis »
Ronan Perry · Julius von Kügelgen · Bernhard Schölkopf 
2022 Poster: Causal Inference with NonIID Data using Linear Graphical Models »
Chi Zhang · Karthika Mohan · Judea Pearl 
2022 Poster: Direct Advantage Estimation »
HsiaoRu Pan · Nico Gürtler · Alexander Neitz · Bernhard Schölkopf 
2022 Poster: Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness »
Amin Jaber · Adele Ribeiro · Jiji Zhang · Elias Bareinboim 
2022 Poster: Sampling without Replacement Leads to Faster Rates in FiniteSum Minimax Optimization »
Aniket Das · Bernhard Schölkopf · Michael Muehlebach 
2022 Poster: Is a Modular Architecture Enough? »
Sarthak Mittal · Yoshua Bengio · Guillaume Lajoie 
2022 Poster: RuleBased but Flexible? Evaluating and Improving Language Models as Accounts of Human Moral Judgment »
Zhijing Jin · Sydney Levine · Fernando Gonzalez Adauto · Ojasv Kamal · Maarten Sap · Mrinmaya Sachan · Rada Mihalcea · Josh Tenenbaum · Bernhard Schölkopf 
2021 : Panel Discussion »
Elias Bareinboim · Mark van der Laan · Claire Vernade 
2021 : Live Q&A Session 2 with Susan Athey, Yoshua Bengio, Sujeeth Bharadwaj, Jane Wang, Joshua Vogelstein, Weiwei Yang »
Susan Athey · Yoshua Bengio · Sujeeth Bharadwaj · Jane Wang · Weiwei Yang · Joshua T Vogelstein 
2021 : TBD (Elias Bareibnboim) »
Elias Bareinboim 
2021 : Live Q&A Session 1 with Yoshua Bengio, Leyla Isik, Konrad Kording, Bernhard Scholkopf, Amit Sharma, Joshua Vogelstein, Weiwei Yang »
Yoshua Bengio · Leyla Isik · Konrad Kording · Bernhard Schölkopf · Joshua T Vogelstein · Weiwei Yang 
2021 Workshop: Tackling Climate Change with Machine Learning »
Maria João Sousa · Hari Prasanna Das · Sally Simone Fobi · Jan Drgona · Tegan Maharaj · Yoshua Bengio 
2021 : General Discussion 1  What is out of distribution (OOD) generalization and why is it important? with Yoshua Bengio, Leyla Isik, Max Welling »
Yoshua Bengio · Leyla Isik · Max Welling · Joshua T Vogelstein · Weiwei Yang 
2021 : Dominguez Olmedo, Karimi, Schölkopf  On the Adversarial Robustness of Causal Algorithmic Recourse »
Ricardo DominguezOlmedo · Amir Karimi · Bernhard Schölkopf 
2021 : Panel Discussion 3 »
Taylor Webb · Hakwan Lau · Bernhard Schölkopf · Jiangying Zhou · Lior Horesh · Francesca Rossi 
2021 : Causal World Models »
Bernhard Schölkopf 
2021 : AI X Discovery »
Yoshua Bengio 
2021 : Panel Discussion 2 »
Susan L Epstein · Yoshua Bengio · Lucina Uddin · Rohan Paul · Steve Fleming 
2021 : Boxhead: A Dataset for Learning Hierarchical Representations »
Yukun Chen · Andrea Dittadi · Frederik Träuble · Stefan Bauer · Bernhard Schölkopf 
2021 : Desiderata and ML Research Programme for HigherLevel Cognition »
Yoshua Bengio 
2021 : Invited Talk: Causality and Fairness »
Elias Bareinboim 
2021 Oral: Sequential Causal Imitation Learning with Unobserved Confounders »
Daniel Kumor · Junzhe Zhang · Elias Bareinboim 
2021 Poster: Dynamic Inference with Neural Interpreters »
Nasim Rahaman · Muhammad Waleed Gondal · Shruti Joshi · Peter Gehler · Yoshua Bengio · Francesco Locatello · Bernhard Schölkopf 
2021 Poster: Gradient Starvation: A Learning Proclivity in Neural Networks »
Mohammad Pezeshki · Oumar Kaba · Yoshua Bengio · Aaron Courville · Doina Precup · Guillaume Lajoie 
2021 Poster: Causal Influence Detection for Improving Efficiency in Reinforcement Learning »
Maximilian Seitzer · Bernhard Schölkopf · Georg Martius 
2021 Poster: Causal Identification with Matrix Equations »
Sanghack Lee · Elias Bareinboim 
2021 Poster: Independent mechanism analysis, a new concept? »
Luigi Gresele · Julius von Kügelgen · Vincent Stimper · Bernhard Schölkopf · Michel Besserve 
2021 Poster: Nested Counterfactual Identification from Arbitrary Surrogate Experiments »
Juan Correa · Sanghack Lee · Elias Bareinboim 
2021 Poster: A ConsciousnessInspired Planning Agent for ModelBased Reinforcement Learning »
Mingde Zhao · Zhen Liu · Sitao Luan · Shuyuan Zhang · Doina Precup · Yoshua Bengio 
2021 : Real Robot Challenge II + Q&A »
Stefan Bauer · Joel Akpo · Manuel Wuethrich · Nan Rosemary Ke · Anirudh Goyal · Thomas Steinbrenner · Felix Widmaier · Annika Buchholz · Bernhard Schölkopf · Dieter Büchler · Ludovic Righetti · Franziska Meier 
2021 Poster: Neural Production Systems »
Anirudh Goyal · Aniket Didolkar · Nan Rosemary Ke · Charles Blundell · Philippe Beaudoin · Nicolas Heess · Michael Mozer · Yoshua Bengio 
2021 Poster: Flow Network based Generative Models for NonIterative Diverse Candidate Generation »
Emmanuel Bengio · Moksh Jain · Maksym Korablyov · Doina Precup · Yoshua Bengio 
2021 Poster: Iterative Teaching by Label Synthesis »
Weiyang Liu · Zhen Liu · Hanchen Wang · Liam Paull · Bernhard Schölkopf · Adrian Weller 
2021 Poster: Sequential Causal Imitation Learning with Unobserved Confounders »
Daniel Kumor · Junzhe Zhang · Elias Bareinboim 
2021 Poster: The CausalNeural Connection: Expressiveness, Learnability, and Inference »
Kevin Xia · KaiZhan Lee · Yoshua Bengio · Elias Bareinboim 
2021 Poster: Invariance Principle Meets Information Bottleneck for OutofDistribution Generalization »
Kartik Ahuja · Ethan Caballero · Dinghuai Zhang · JeanChristophe GagnonAudet · Yoshua Bengio · Ioannis Mitliagkas · Irina Rish 
2021 Poster: Double Machine Learning Density Estimation for Local Treatment Effects with Instruments »
Yonghan Jung · Jin Tian · Elias Bareinboim 
2021 Poster: DiscreteValued Neural Communication »
Dianbo Liu · Alex Lamb · Kenji Kawaguchi · Anirudh Goyal · Chen Sun · Michael Mozer · Yoshua Bengio 
2021 Poster: The Inductive Bias of Quantum Kernels »
Jonas Kübler · Simon Buchholz · Bernhard Schölkopf 
2021 Poster: BackwardCompatible Prediction Updates: A Probabilistic Approach »
Frederik Träuble · Julius von Kügelgen · Matthäus Kleindessner · Francesco Locatello · Bernhard Schölkopf · Peter Gehler 
2021 Poster: SelfSupervised Learning with Data Augmentations Provably Isolates Content from Style »
Julius von Kügelgen · Yash Sharma · Luigi Gresele · Wieland Brendel · Bernhard Schölkopf · Michel Besserve · Francesco Locatello 
2021 Poster: DiBS: Differentiable Bayesian Structure Learning »
Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause 
2021 Poster: Regret Bounds for GaussianProcess Optimization in Large Domains »
Manuel Wuethrich · Bernhard Schölkopf · Andreas Krause 
2021 Oral: Causal Identification with Matrix Equations »
Sanghack Lee · Elias Bareinboim 
2020 : Panel discussion 2 »
Danielle S Bassett · Yoshua Bengio · Cristina Savin · David Duvenaud · Anna Choromanska · Yanping Huang 
2020 : Contributed Talk 3: Algorithmic Recourse: from Counterfactual Explanations to Interventions »
AmirHossein Karimi · Bernhard Schölkopf · Isabel Valera 
2020 : Invited Talk Yoshua Bengio »
Yoshua Bengio 
2020 : Invited Talk #7 »
Yoshua Bengio 
2020 : Panel #1 »
Yoshua Bengio · Daniel Kahneman · Henry Kautz · Luis Lamb · Gary Marcus · Francesca Rossi 
2020 : Yoshua Bengio  Incentives for Researchers »
Yoshua Bengio 
2020 Workshop: Causal Discovery and CausalityInspired Machine Learning »
Biwei Huang · Sara Magliacane · Kun Zhang · Danielle Belgrave · Elias Bareinboim · Daniel Malinsky · Thomas Richardson · Christopher Meek · Peter Spirtes · Bernhard Schölkopf 
2020 Workshop: Tackling Climate Change with ML »
David Dao · Evan Sherwin · Priya Donti · Lauren Kuntz · Lynn Kaack · Yumna Yusuf · David Rolnick · Catherine Nakalembe · Claire Monteleoni · Yoshua Bengio 
2020 Poster: Untangling tradeoffs between recurrence and selfattention in artificial neural networks »
Giancarlo Kerg · Bhargav Kanuparthi · Anirudh Goyal · Kyle Goyette · Yoshua Bengio · Guillaume Lajoie 
2020 Poster: Your GAN is Secretly an Energybased Model and You Should Use Discriminator Driven Latent Sampling »
Tong Che · Ruixiang ZHANG · Jascha SohlDickstein · Hugo Larochelle · Liam Paull · Yuan Cao · Yoshua Bengio 
2020 Poster: Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe »
Sanghack Lee · Elias Bareinboim 
2020 Memorial: In Memory of Olivier Chapelle »
Bernhard Schölkopf · Andre Elisseeff · Olivier Bousquet · Vladimir Vapnik · Jason E Weston 
2020 Poster: Learning Kernel Tests Without Data Splitting »
Jonas Kübler · Wittawat Jitkrittum · Bernhard Schölkopf · Krikamol Muandet 
2020 Poster: Hybrid Models for Learning to Branch »
Prateek Gupta · Maxime Gasse · Elias Khalil · Pawan K Mudigonda · Andrea Lodi · Yoshua Bengio 
2020 Poster: Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning »
Amin Jaber · Murat Kocaoglu · Karthikeyan Shanmugam · Elias Bareinboim 
2020 Poster: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach »
AmirHossein Karimi · Julius von Kügelgen · Bernhard Schölkopf · Isabel Valera 
2020 Poster: Causal analysis of Covid19 Spread in Germany »
Atalanti Mastakouri · Bernhard Schölkopf 
2020 Poster: Causal Imitation Learning With Unobserved Confounders »
Junzhe Zhang · Daniel Kumor · Elias Bareinboim 
2020 Poster: General Transportability of Soft Interventions: Completeness Results »
Juan Correa · Elias Bareinboim 
2020 Poster: Learning Causal Effects via Weighted Empirical Risk Minimization »
Yonghan Jung · Jin Tian · Elias Bareinboim 
2020 Spotlight: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach »
AmirHossein Karimi · Julius von Kügelgen · Bernhard Schölkopf · Isabel Valera 
2020 Oral: Causal Imitation Learning With Unobserved Confounders »
Junzhe Zhang · Daniel Kumor · Elias Bareinboim 
2020 Poster: Relative gradient optimization of the Jacobian term in unsupervised deep learning »
Luigi Gresele · Giancarlo Fissore · Adrián Javaloy · Bernhard Schölkopf · Aapo Hyvarinen 
2019 : Panel Session: A new hope for neuroscience »
Yoshua Bengio · Blake Richards · Timothy Lillicrap · Ila Fiete · David Sussillo · Doina Precup · Konrad Kording · Surya Ganguli 
2019 : Poster Session »
Pravish Sainath · Mohamed Akrout · Charles Delahunt · Nathan Kutz · Guangyu Robert Yang · Joseph Marino · L F Abbott · Nicolas Vecoven · Damien Ernst · andrew warrington · Michael Kagan · Kyunghyun Cho · Kameron Harris · Leopold Grinberg · John J. Hopfield · Dmitry Krotov · Taliah Muhammad · Erick Cobos · Edgar Walker · Jacob Reimer · Andreas Tolias · Alexander Ecker · Janaki Sheth · Yu Zhang · Maciej Wołczyk · Jacek Tabor · Szymon Maszke · Roman Pogodin · Dane Corneil · Wulfram Gerstner · Baihan Lin · Guillermo Cecchi · Jenna M Reinen · Irina Rish · Guillaume Bellec · Darjan Salaj · Anand Subramoney · Wolfgang Maass · Yueqi Wang · Ari Pakman · Jin Hyung Lee · Liam Paninski · Bryan Tripp · Colin Graber · Alex Schwing · Luke Prince · Gabriel Ocker · Michael Buice · Benjamin Lansdell · Konrad Kording · Jack Lindsey · Terrence Sejnowski · Matthew Farrell · Eric SheaBrown · Nicolas Farrugia · Victor Nepveu · Jiwoong Im · Kristin Branson · Brian Hu · Ramakrishnan Iyer · Stefan Mihalas · Sneha Aenugu · Hananel Hazan · Sihui Dai · Tan Nguyen · Doris Tsao · Richard Baraniuk · Anima Anandkumar · Hidenori Tanaka · Aran Nayebi · Stephen Baccus · Surya Ganguli · Dean Pospisil · Eilif Muller · Jeffrey S Cheng · Gaël Varoquaux · Kamalaker Dadi · Dimitrios C Gklezakos · Rajesh PN Rao · Anand Louis · Christos Papadimitriou · Santosh Vempala · Naganand Yadati · Daniel Zdeblick · Daniela M Witten · Nicholas Roberts · Vinay Prabhu · Pierre Bellec · Poornima Ramesh · Jakob H Macke · Santiago Cadena · Guillaume Bellec · Franz Scherr · Owen Marschall · Robert Kim · Hannes Rapp · Marcio Fonseca · Oliver Armitage · Jiwoong Im · Thomas Hardcastle · Abhishek Sharma · Wyeth Bair · Adrian Valente · Shane Shang · Merav Stern · Rutuja Patil · Peter Wang · Sruthi Gorantla · Peter Stratton · Tristan Edwards · Jialin Lu · Martin Ester · Yurii Vlasov · Siavash Golkar 
2019 : Yoshua Bengio  Towards compositional understanding of the world by agentbased deep learning »
Yoshua Bengio 
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · WeiCheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal BenNun · Torsten Hoefler · Daniel Soudry · HsiangFu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · YaoHung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy GurAri · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha SohlDickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu 
2019 : Climate Change: A Grand Challenge for ML »
Yoshua Bengio · Carla Gomes · Andrew Ng · Jeff Dean · Lester Mackey 
2019 : Bernhard Schölkopf »
Bernhard Schölkopf 
2019 Workshop: Joint Workshop on AI for Social Good »
Fei Fang · Joseph AylettBullock · MarcAntoine Dilhac · Brian Green · natalie saltiel · Dhaval Adjodah · Jack Clark · Sean McGregor · Margaux Luck · Jonathan Penn · Tristan Sylvain · Geneviève Boucher · Sydney SwaineSimon · Girmaw Abebe Tadesse · Myriam Côté · Anna Bethke · Yoshua Bengio 
2019 Workshop: Tackling Climate Change with ML »
David Rolnick · Priya Donti · Lynn Kaack · Alexandre Lacoste · Tegan Maharaj · Andrew Ng · John Platt · Jennifer Chayes · Yoshua Bengio 
2019 : Opening remarks »
Yoshua Bengio 
2019 : Poster Session »
Ethan Harris · Tom White · Oh Hyeon Choung · Takashi Shinozaki · Dipan Pal · Katherine L. Hermann · Judy Borowski · Camilo Fosco · Chaz Firestone · Vijay Veerabadran · Benjamin Lahner · Chaitanya Ryali · Fenil Doshi · Pulkit Singh · Sharon Zhou · Michel Besserve · Michael Chang · Anelise Newman · Mahesan Niranjan · Jonathon Hare · Daniela Mihai · Marios Savvides · Simon Kornblith · Christina M Funke · Aude Oliva · Virginia de Sa · Dmitry Krotov · Colin Conwell · George Alvarez · Alex Kolchinski · Shengjia Zhao · Mitchell Gordon · Michael Bernstein · Stefano Ermon · Arash Mehrjou · Bernhard Schölkopf · John CoReyes · Michael Janner · Jiajun Wu · Josh Tenenbaum · Sergey Levine · Yalda Mohsenzadeh · Zhenglong Zhou 
2019 : Approaches to Understanding AI »
Yoshua Bengio · Roel Dobbe · Madeleine Elish · Joshua Kroll · Jacob Metcalf · Jack Poulson 
2019 : Invited Talk »
Yoshua Bengio 
2019 Workshop: Retrospectives: A Venue for SelfReflection in ML Research »
Ryan Lowe · Yoshua Bengio · Joelle Pineau · Michela Paganini · Jessica Forde · Shagun Sodhani · Abhishek Gupta · Joel Lehman · Peter Henderson · Kanika Madan · Koustuv Sinha · Xavier Bouthillier 
2019 Poster: How to Initialize your Network? Robust Initialization for WeightNorm & ResNets »
Devansh Arpit · Víctor Campos · Yoshua Bengio 
2019 Poster: On the Fairness of Disentangled Representations »
Francesco Locatello · Gabriele Abbati · Thomas Rainforth · Stefan Bauer · Bernhard Schölkopf · Olivier Bachem 
2019 Poster: Wasserstein Dependency Measure for Representation Learning »
Sherjil Ozair · Corey Lynch · Yoshua Bengio · Aaron van den Oord · Sergey Levine · Pierre Sermanet 
2019 Poster: On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset »
Muhammad Waleed Gondal · Manuel Wuethrich · Djordje Miladinovic · Francesco Locatello · Martin Breidt · Valentin Volchkov · Joel Akpo · Olivier Bachem · Bernhard Schölkopf · Stefan Bauer 
2019 Poster: Unsupervised State Representation Learning in Atari »
Ankesh Anand · Evan Racah · Sherjil Ozair · Yoshua Bengio · MarcAlexandre Côté · R Devon Hjelm 
2019 Poster: Variational Temporal Abstraction »
Taesup Kim · Sungjin Ahn · Yoshua Bengio 
2019 Poster: Gradient based sample selection for online continual learning »
Rahaf Aljundi · Min Lin · Baptiste Goujaud · Yoshua Bengio 
2019 Poster: MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis »
Kundan Kumar · Rithesh Kumar · Thibault de Boissiere · Lucas Gestin · Wei Zhen Teoh · Jose Sotelo · Alexandre de Brébisson · Yoshua Bengio · Aaron Courville 
2019 Invited Talk: From System 1 Deep Learning to System 2 Deep Learning »
Yoshua Bengio 
2019 Poster: Perceiving the arrow of time in autoregressive motion »
Kristof Meding · Dominik Janzing · Bernhard Schölkopf · Felix A. Wichmann 
2019 Poster: On Adversarial Mixup Resynthesis »
Christopher Beckham · Sina Honari · Alex Lamb · Vikas Verma · Farnoosh Ghadiri · R Devon Hjelm · Yoshua Bengio · Chris Pal 
2019 Poster: Selecting causal brain features with a single conditional independence test per feature »
Atalanti Mastakouri · Bernhard Schölkopf · Dominik Janzing 
2019 Poster: Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input »
Maxence Ernoult · Julie Grollier · Damien Querlioz · Yoshua Bengio · Benjamin Scellier 
2019 Poster: Kernel Stein Tests for Multiple Model Comparison »
Jen Ning Lim · Makoto Yamada · Bernhard Schölkopf · Wittawat Jitkrittum 
2019 Poster: Nonnormal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics »
Giancarlo Kerg · Kyle Goyette · Maximilian Puelma Touzel · Gauthier Gidel · Eugene Vorontsov · Yoshua Bengio · Guillaume Lajoie 
2019 Spotlight: Perceiving the arrow of time in autoregressive motion »
Kristof Meding · Dominik Janzing · Bernhard Schölkopf · Felix A. Wichmann 
2019 Oral: Updates of Equilibrium Prop Match Gradients of Backprop Through Time in an RNN with Static Input »
Maxence Ernoult · Julie Grollier · Damien Querlioz · Yoshua Bengio · Benjamin Scellier 
2018 : Opening remarks »
Yoshua Bengio 
2018 Workshop: AI for social good »
Margaux Luck · Tristan Sylvain · Joseph Paul Cohen · Arsene Fansi Tchango · Valentine Goddard · Aurelie Helouis · Yoshua Bengio · Sam Greydanus · Cody Wild · Taras Kucherenko · Arya Farahi · Jonathan Penn · Sean McGregor · Mark Crowley · Abhishek Gupta · Kenny Chen · Myriam Côté · Rediet Abebe 
2018 : Datasets and Benchmarks for Causal Learning »
Csaba Szepesvari · Isabelle Guyon · Nicolai Meinshausen · David Blei · Elias Bareinboim · Bernhard Schölkopf · Pietro Perona 
2018 : Learning Independent Mechanisms »
Bernhard Schölkopf 
2018 Poster: Informative Features for Model Comparison »
Wittawat Jitkrittum · Heishiro Kanagawa · Patsorn Sangkloy · James Hays · Bernhard Schölkopf · Arthur Gretton 
2018 Poster: Imagetoimage translation for crossdomain disentanglement »
Abel GonzalezGarcia · Joost van de Weijer · Yoshua Bengio 
2018 Poster: Gradient Descent for Spiking Neural Networks »
Dongsung Huh · Terrence Sejnowski 
2018 Poster: Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models »
Alexander Neitz · Giambattista Parascandolo · Stefan Bauer · Bernhard Schölkopf 
2018 Poster: MetaGAN: An Adversarial Approach to FewShot Learning »
Ruixiang ZHANG · Tong Che · Zoubin Ghahramani · Yoshua Bengio · Yangqiu Song 
2018 Poster: Bayesian ModelAgnostic MetaLearning »
Jaesik Yoon · Taesup Kim · Ousmane Dia · Sungwoong Kim · Yoshua Bengio · Sungjin Ahn 
2018 Poster: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Nan Rosemary Ke · Anirudh Goyal · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio 
2018 Spotlight: Sparse Attentive Backtracking: Temporal Credit Assignment Through Reminding »
Nan Rosemary Ke · Anirudh Goyal · Olexa Bilaniuk · Jonathan Binas · Michael Mozer · Chris Pal · Yoshua Bengio 
2018 Spotlight: Bayesian ModelAgnostic MetaLearning »
Jaesik Yoon · Taesup Kim · Ousmane Dia · Sungwoong Kim · Yoshua Bengio · Sungjin Ahn 
2018 Poster: Dendritic cortical microcircuits approximate the backpropagation algorithm »
João Sacramento · Rui Ponte Costa · Yoshua Bengio · Walter Senn 
2018 Oral: Dendritic cortical microcircuits approximate the backpropagation algorithm »
João Sacramento · Rui Ponte Costa · Yoshua Bengio · Walter Senn 
2017 : Yoshua Bengio »
Yoshua Bengio 
2017 : From deep learning of disentangled representations to higherlevel cognition »
Yoshua Bengio 
2017 : More Steps towards Biologically Plausible Backprop »
Yoshua Bengio 
2017 : Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation »
Alice Oh · Bernhard Schölkopf 
2017 : A3T: Adversarially Augmented Adversarial Training »
Aristide Baratin · Simon LacosteJulien · Yoshua Bengio · Akram Erraqabi 
2017 : Contributed Talk 4 »
Judea Pearl 
2017 : Competition III: The Conversational Intelligence Challenge »
Mikhail Burtsev · Ryan Lowe · Iulian Vlad Serban · Yoshua Bengio · Alexander Rudnicky · Alan W Black · Shrimai Prabhumoye · Artem Rodichev · Nikita Smetanin · Denis Fedorenko · CheongAn Lee · EUNMI HONG · Hwaran Lee · Geonmin Kim · Nicolas Gontier · Atsushi Saito · Andrey Gershfeld · Artem Burachenok 
2017 : Poster session »
Abbas Zaidi · Christoph Kurz · David Heckerman · YiJyun Lin · Stefan Riezler · Ilya Shpitser · Songbai Yan · Olivier Goudet · Yash Deshpande · Judea Pearl · Jovana Mitrovic · Brian Vegetabile · Tae Hwy Lee · Karen Sachs · Karthika Mohan · Reagan Rose · Julius Ramakers · Negar Hassanpour · Pierre Baldi · Razieh Nabi · Noah Hammarlund · Eli Sherman · Carolin Lawrence · Fattaneh Jabbari · Vira Semenova · Maria Dimakopoulou · Pratik Gajane · Russell Greiner · Ilias Zadik · Alexander Blocker · Hao Xu · Tal EL HAY · Tony Jebara · Benoit Rostykus 
2017 Poster: Avoiding Discrimination through Causal Reasoning »
Niki Kilbertus · Mateo Rojas Carulla · Giambattista Parascandolo · Moritz Hardt · Dominik Janzing · Bernhard Schölkopf 
2017 Poster: Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net »
Anirudh Goyal · Nan Rosemary Ke · Surya Ganguli · Yoshua Bengio 
2017 Demonstration: A Deep Reinforcement Learning Chatbot »
Iulian Vlad Serban · Chinnadhurai Sankar · Mathieu Germain · Saizheng Zhang · Zhouhan Lin · Sandeep Subramanian · Taesup Kim · Michael Pieper · Sarath Chandar · Nan Rosemary Ke · Sai Rajeswar Mudumba · Alexandre de Brébisson · Jose Sotelo · Dendi A Suhubdy · Vincent Michalski · Joelle Pineau · Yoshua Bengio 
2017 Poster: GibbsNet: Iterative Adversarial Inference for Deep Graphical Models »
Alex Lamb · R Devon Hjelm · Yaroslav Ganin · Joseph Paul Cohen · Aaron Courville · Yoshua Bengio 
2017 Poster: Interpolated Policy Gradient: Merging OnPolicy and OffPolicy Gradient Estimation for Deep Reinforcement Learning »
Shixiang (Shane) Gu · Timothy Lillicrap · Richard Turner · Zoubin Ghahramani · Bernhard Schölkopf · Sergey Levine 
2017 Poster: AdaGAN: Boosting Generative Models »
Ilya Tolstikhin · Sylvain Gelly · Olivier Bousquet · CarlJohann SIMONGABRIEL · Bernhard Schölkopf 
2017 Poster: Plan, Attend, Generate: Planning for SequencetoSequence Models »
Caglar Gulcehre · Francis Dutil · Adam Trischler · Yoshua Bengio 
2017 Poster: ZForcing: Training Stochastic Recurrent Networks »
Anirudh Goyal · Alessandro Sordoni · MarcAlexandre Côté · Nan Rosemary Ke · Yoshua Bengio 
2016 : Yoshua Bengio – Credit assignment: beyond backpropagation »
Yoshua Bengio 
2016 : From Brains to Bits and Back Again »
Yoshua Bengio · Terrence Sejnowski · Christos H Papadimitriou · Jakob H Macke · Demis Hassabis · Alyson Fletcher · Andreas Tolias · Jascha SohlDickstein · Konrad P Koerding 
2016 : Yoshua Bengio : Toward Biologically Plausible Deep Learning »
Yoshua Bengio 
2016 : Panel on "Explainable AI" (Yoshua Bengio, Alessio Lomuscio, Gary Marcus, Stephen Muggleton, Michael Witbrock) »
Yoshua Bengio · Alessio Lomuscio · Gary Marcus · Stephen H Muggleton · Michael Witbrock 
2016 : Yoshua Bengio: From Training Low Precision Neural Nets to Training Analog ContinuousTime Machines »
Yoshua Bengio 
2016 Symposium: Deep Learning Symposium »
Yoshua Bengio · Yann LeCun · Navdeep Jaitly · Roger Grosse 
2016 Poster: Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels »
Ilya Tolstikhin · Bharath Sriperumbudur · Bernhard Schölkopf 
2016 Poster: Architectural Complexity Measures of Recurrent Neural Networks »
Saizheng Zhang · Yuhuai Wu · Tong Che · Zhouhan Lin · Roland Memisevic · Russ Salakhutdinov · Yoshua Bengio 
2016 Poster: Professor Forcing: A New Algorithm for Training Recurrent Networks »
Alex M Lamb · Anirudh Goyal · Ying Zhang · Saizheng Zhang · Aaron Courville · Yoshua Bengio 
2016 Poster: On Multiplicative Integration with Recurrent Neural Networks »
Yuhuai Wu · Saizheng Zhang · Ying Zhang · Yoshua Bengio · Russ Salakhutdinov 
2016 Poster: Binarized Neural Networks »
Itay Hubara · Matthieu Courbariaux · Daniel Soudry · Ran ElYaniv · Yoshua Bengio 
2016 Poster: Consistent Kernel Mean Estimation for Functions of Random Variables »
CarlJohann SimonGabriel · Adam Scibior · Ilya Tolstikhin · Bernhard Schölkopf 
2015 : RL for DL »
Yoshua Bengio 
2015 : Learning Representations for Unsupervised and Transfer Learning »
Yoshua Bengio 
2015 Symposium: Deep Learning Symposium »
Yoshua Bengio · Marc'Aurelio Ranzato · Honglak Lee · Max Welling · Andrew Y Ng 
2015 Poster: AttentionBased Models for Speech Recognition »
Jan K Chorowski · Dzmitry Bahdanau · Dmitriy Serdyuk · Kyunghyun Cho · Yoshua Bengio 
2015 Poster: Equilibrated adaptive learning rates for nonconvex optimization »
Yann Dauphin · Harm de Vries · Yoshua Bengio 
2015 Spotlight: Equilibrated adaptive learning rates for nonconvex optimization »
Yann Dauphin · Harm de Vries · Yoshua Bengio 
2015 Spotlight: AttentionBased Models for Speech Recognition »
Jan K Chorowski · Dzmitry Bahdanau · Dmitriy Serdyuk · Kyunghyun Cho · Yoshua Bengio 
2015 Poster: Bandits with Unobserved Confounders: A Causal Approach »
Elias Bareinboim · Andrew Forney · Judea Pearl 
2015 Poster: A Recurrent Latent Variable Model for Sequential Data »
Junyoung Chung · Kyle Kastner · Laurent Dinh · Kratarth Goel · Aaron Courville · Yoshua Bengio 
2015 Poster: BinaryConnect: Training Deep Neural Networks with binary weights during propagations »
Matthieu Courbariaux · Yoshua Bengio · JeanPierre David 
2015 Tutorial: Deep Learning »
Geoffrey E Hinton · Yoshua Bengio · Yann LeCun 
2014 Workshop: Second Workshop on Transfer and MultiTask Learning: Theory meets Practice »
Urun Dogan · Tatiana Tommasi · Yoshua Bengio · Francesco Orabona · Marius Kloft · Andres Munoz · Gunnar Rätsch · Hal Daumé III · Mehryar Mohri · Xuezhi Wang · Daniel Hernándezlobato · Song Liu · Thomas Unterthiner · Pascal Germain · Vinay P Namboodiri · Michael Goetz · Christopher Berlind · Sigurd Spieckermann · Marta Soare · Yujia Li · Vitaly Kuznetsov · Wenzhao Lian · Daniele Calandriello · Emilie Morvant 
2014 Workshop: Deep Learning and Representation Learning »
Andrew Y Ng · Yoshua Bengio · Adam Coates · Roland Memisevic · Sharanyan Chetlur · Geoffrey E Hinton · Shamim Nemati · Bryan Catanzaro · Surya Ganguli · Herbert Jaeger · Phil Blunsom · Leon Bottou · Volodymyr Mnih · ChenYu Lee · Rich M Schwartz 
2014 Workshop: OPT2014: Optimization for Machine Learning »
Zaid Harchaoui · Suvrit Sra · Alekh Agarwal · Martin Jaggi · Miro Dudik · Aaditya Ramdas · Jean Lasserre · Yoshua Bengio · Amir Beck 
2014 Poster: Transportability from Multiple Environments with Limited Experiments: Completeness Results »
Elias Bareinboim · Judea Pearl 
2014 Poster: Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data »
Karthika Mohan · Judea Pearl 
2014 Spotlight: Transportability from Multiple Environments with Limited Experiments: Completeness Results »
Elias Bareinboim · Judea Pearl 
2014 Poster: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson 
2014 Poster: Identifying and attacking the saddle point problem in highdimensional nonconvex optimization »
Yann N Dauphin · Razvan Pascanu · Caglar Gulcehre · Kyunghyun Cho · Surya Ganguli · Yoshua Bengio 
2014 Poster: Generative Adversarial Nets »
Ian Goodfellow · Jean PougetAbadie · Mehdi Mirza · Bing Xu · David WardeFarley · Sherjil Ozair · Aaron Courville · Yoshua Bengio 
2014 Poster: On the Number of Linear Regions of Deep Neural Networks »
Guido F Montufar · Razvan Pascanu · Kyunghyun Cho · Yoshua Bengio 
2014 Demonstration: Neural Machine Translation »
Bart van Merriënboer · Kyunghyun Cho · Dzmitry Bahdanau · Yoshua Bengio 
2014 Oral: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson 
2014 Poster: Iterative Neural Autoregressive Distribution Estimator NADEk »
Tapani Raiko · Yao Li · Kyunghyun Cho · Yoshua Bengio 
2014 Poster: Kernel Mean Estimation via Spectral Filtering »
Krikamol Muandet · Bharath Sriperumbudur · Bernhard Schölkopf 
2013 Workshop: Deep Learning »
Yoshua Bengio · Hugo Larochelle · Russ Salakhutdinov · Tomas Mikolov · Matthew D Zeiler · David Mcallester · Nando de Freitas · Josh Tenenbaum · Jian Zhou · Volodymyr Mnih 
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser 
2013 Workshop: Modern Nonparametric Methods in Machine Learning »
Arthur Gretton · Mladen Kolar · Samory Kpotufe · John Lafferty · Han Liu · Bernhard Schölkopf · Alexander Smola · Rob Nowak · Mikhail Belkin · Lorenzo Rosasco · peter bickel · Yue Zhao 
2013 Workshop: NIPS 2013 Workshop on Causality: Largescale Experiment Design and Inference of Causal Mechanisms »
Isabelle Guyon · Leon Bottou · Bernhard Schölkopf · Alexander Statnikov · Evelyne Viegas · james m robins 
2013 Poster: The Randomized Dependence Coefficient »
David LopezPaz · Philipp Hennig · Bernhard Schölkopf 
2013 Poster: Statistical analysis of coupled time series with Kernel CrossSpectral Density operators. »
Michel Besserve · Nikos K Logothetis · Bernhard Schölkopf 
2013 Poster: Transportability from Multiple Environments with Limited Experiments »
Elias Bareinboim · Sanghack Lee · Vasant Honavar · Judea Pearl 
2013 Poster: Causal Inference on Time Series using Restricted Structural Equation Models »
Jonas Peters · Dominik Janzing · Bernhard Schölkopf 
2013 Poster: MultiPrediction Deep Boltzmann Machines »
Ian Goodfellow · Mehdi Mirza · Aaron Courville · Yoshua Bengio 
2013 Poster: Generalized Denoising AutoEncoders as Generative Models »
Yoshua Bengio · Li Yao · Guillaume Alain · Pascal Vincent 
2013 Poster: Graphical Models for Inference with Missing Data »
Karthika Mohan · Judea Pearl · Jin Tian 
2013 Poster: Stochastic Ratio Matching of RBMs for Sparse HighDimensional Inputs »
Yann Dauphin · Yoshua Bengio 
2013 Session: Oral Session 3 »
Terrence Sejnowski 
2013 Spotlight: Graphical Models for Inference with Missing Data »
Karthika Mohan · Judea Pearl · Jin Tian 
2013 Tutorial: Causes and Counterfactuals: Concepts, Principles and Tools. »
Judea Pearl · Elias Bareinboim 
2012 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · James Bergstra · Quoc V. Le 
2012 Poster: Learning from Distributions via Support Measure Machines »
Krikamol Muandet · Kenji Fukumizu · Francesco Dinuzzo · Bernhard Schölkopf 
2012 Invited Talk: Suspicious Coincidences in the Brain »
Terrence Sejnowski 
2012 Spotlight: Learning from Distributions via Support Measure Machines »
Krikamol Muandet · Kenji Fukumizu · Francesco Dinuzzo · Bernhard Schölkopf 
2012 Poster: SemiSupervised Domain Adaptation with NonParametric Copulas »
David LopezPaz · José Miguel HernándezLobato · Bernhard Schölkopf 
2012 Spotlight: SemiSupervised Domain Adaptation with NonParametric Copulas »
David LopezPaz · José Miguel HernándezLobato · Bernhard Schölkopf 
2012 Poster: The representer theorem for Hilbert spaces: a necessary and sufficient condition »
Francesco Dinuzzo · Bernhard Schölkopf 
2011 Workshop: Philosophy and Machine Learning »
Marcello Pelillo · Joachim M Buhmann · Tiberio Caetano · Bernhard Schölkopf · Larry Wasserman 
2011 Workshop: Big Learning: Algorithms, Systems, and Tools for Learning at Scale »
Joseph E Gonzalez · Sameer Singh · Graham Taylor · James Bergstra · Alice Zheng · Misha Bilenko · Yucheng Low · Yoshua Bengio · Michael Franklin · Carlos Guestrin · Andrew McCallum · Alexander Smola · Michael Jordan · Sugato Basu 
2011 Workshop: Deep Learning and Unsupervised Feature Learning »
Yoshua Bengio · Adam Coates · Yann LeCun · Nicolas Le Roux · Andrew Y Ng 
2011 Workshop: Cosmology meets Machine Learning »
Michael Hirsch · Sarah Bridle · Bernhard Schölkopf · Phil Marshall · Stefan Harmeling · Mark Girolami 
2011 Oral: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller 
2011 Poster: Shallow vs. Deep SumProduct Networks »
Olivier Delalleau · Yoshua Bengio 
2011 Poster: The Manifold Tangent Classifier »
Salah Rifai · Yann N Dauphin · Pascal Vincent · Yoshua Bengio · Xavier Muller 
2011 Invited Talk: From kernels to causal inference »
Bernhard Schölkopf 
2011 Poster: Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance »
Peter Gehler · Carsten Rother · Martin Kiefel · Lumin Zhang · Bernhard Schölkopf 
2011 Poster: Algorithms for HyperParameter Optimization »
James Bergstra · Rémi Bardenet · Yoshua Bengio · Balázs Kégl 
2011 Poster: Causal Discovery with Cyclic Additive Noise Models »
Joris M Mooij · Dominik Janzing · Tom Heskes · Bernhard Schölkopf 
2011 Poster: On Tracking The Partition Function »
Guillaume Desjardins · Aaron Courville · Yoshua Bengio 
2011 Session: Opening Remarks and Awards »
Terrence Sejnowski · Peter Bartlett · Fernando Pereira 
2010 Workshop: Deep Learning and Unsupervised Feature Learning »
Honglak Lee · Marc'Aurelio Ranzato · Yoshua Bengio · Geoffrey E Hinton · Yann LeCun · Andrew Y Ng 
2010 Placeholder: Opening Remarks »
Terrence Sejnowski · Neil D Lawrence 
2010 Spotlight: Switched Latent Force Models for Movement Segmentation »
Mauricio A Alvarez · Jan Peters · Bernhard Schölkopf · Neil D Lawrence 
2010 Poster: SpaceVariant SingleImage Blind Deconvolution for Removing Camera Shake »
Stefan Harmeling · Michael Hirsch · Bernhard Schölkopf 
2010 Poster: Switched Latent Force Models for Movement Segmentation »
Mauricio A Alvarez · Jan Peters · Bernhard Schölkopf · Neil D Lawrence 
2010 Poster: Probabilistic latent variable models for distinguishing between cause and effect »
Joris M Mooij · Oliver Stegle · Dominik Janzing · Kun Zhang · Bernhard Schölkopf 
2010 Talk: Opening Remarks and Awards »
Richard Zemel · Terrence Sejnowski · John ShaweTaylor 
2009 Workshop: Connectivity Inference in Neuroimaging »
Karl Friston · Moritz GrosseWentrup · Uta Noppeney · Bernhard Schölkopf 
2009 Workshop: The Curse of Dimensionality Problem: How Can the Brain Solve It? »
Simon Haykin · Terrence Sejnowski · Steven W Zucker 
2009 Poster: Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions »
Bharath Sriperumbudur · Kenji Fukumizu · Arthur Gretton · Gert Lanckriet · Bernhard Schölkopf 
2009 Poster: Slow, Decorrelated Features for Pretraining Complex Celllike Networks »
James Bergstra · Yoshua Bengio 
2009 Poster: An Infinite Factor Model Hierarchy Via a NoisyOr Mechanism »
Aaron Courville · Douglas Eck · Yoshua Bengio 
2009 Oral: Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions »
Bharath Sriperumbudur · Kenji Fukumizu · Arthur Gretton · Gert Lanckriet · Bernhard Schölkopf 
2009 Session: Debate on Future Publication Models for the NIPS Community »
Yoshua Bengio 
2008 Workshop: Cortical Microcircuits and their Computational Functions »
Tomaso Poggio · Terrence Sejnowski 
2008 Workshop: Causality: objectives and assessment »
Isabelle Guyon · Dominik Janzing · Bernhard Schölkopf 
2008 Mini Symposium: Computational Photography »
Bill Freeman · Bernhard Schölkopf 
2008 Poster: Characteristic Kernels on Groups and Semigroups »
Kenji Fukumizu · Bharath Sriperumbudur · Arthur Gretton · Bernhard Schölkopf 
2008 Oral: Characteristic Kernels on Groups and Semigroups »
Kenji Fukumizu · Bharath Sriperumbudur · Arthur Gretton · Bernhard Schölkopf 
2008 Poster: Nonlinear causal discovery with additive noise models »
Patrik O Hoyer · Dominik Janzing · Joris M Mooij · Jonas Peters · Bernhard Schölkopf 
2008 Poster: Effects of Stimulus Type and of ErrorCorrecting Code Design on BCI Speller Performance »
Jeremy Hill · Jason Farquhar · Suzanne Martens · Felix Bießmann · Bernhard Schölkopf 
2008 Poster: Bayesian Experimental Design of Magnetic Resonance Imaging Sequences »
Matthias Seeger · Hannes Nickisch · Rolf Pohmann · Bernhard Schölkopf 
2008 Spotlight: Nonlinear causal discovery with additive noise models »
Patrik O Hoyer · Dominik Janzing · Joris M Mooij · Jonas Peters · Bernhard Schölkopf 
2008 Spotlight: Bayesian Experimental Design of Magnetic Resonance Imaging Sequences »
Matthias Seeger · Hannes Nickisch · Rolf Pohmann · Bernhard Schölkopf 
2008 Spotlight: Effects of Stimulus Type and of ErrorCorrecting Code Design on BCI Speller Performance »
Jeremy Hill · Jason Farquhar · Suzanne Martens · Felix Bießmann · Bernhard Schölkopf 
2008 Poster: An empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis »
Gabriele B Schweikert · Christian Widmer · Bernhard Schölkopf · Gunnar Rätsch 
2008 Poster: Diffeomorphic Dimensionality Reduction »
Christian Walder · Bernhard Schölkopf 
2007 Spotlight: Kernel Measures of Conditional Dependence »
Kenji Fukumizu · Arthur Gretton · Xiaohai Sun · Bernhard Schölkopf 
2007 Poster: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio 
2007 Poster: An Analysis of Inference with the Universum »
Fabian H Sinz · Olivier Chapelle · Alekh Agarwal · Bernhard Schölkopf 
2007 Poster: Kernel Measures of Conditional Dependence »
Kenji Fukumizu · Arthur Gretton · Xiaohai Sun · Bernhard Schölkopf 
2007 Poster: Learning the 2D Topology of Images »
Nicolas Le Roux · Yoshua Bengio · Pascal Lamblin · Marc Joliveau · Balázs Kégl 
2007 Spotlight: Augmented Functional Time Series Representation and Forecasting with Gaussian Processes »
Nicolas Chapados · Yoshua Bengio 
2007 Spotlight: An Analysis of Inference with the Universum »
Fabian H Sinz · Olivier Chapelle · Alekh Agarwal · Bernhard Schölkopf 
2007 Spotlight: A Kernel Statistical Test of Independence »
Arthur Gretton · Kenji Fukumizu · Choon Hui Teo · Le Song · Bernhard Schölkopf · Alexander Smola 
2007 Poster: A Kernel Statistical Test of Independence »
Arthur Gretton · Kenji Fukumizu · Choon Hui Teo · Le Song · Bernhard Schölkopf · Alexander Smola 
2007 Poster: Topmoumoute Online Natural Gradient Algorithm »
Nicolas Le Roux · PierreAntoine Manzagol · Yoshua Bengio 
2006 Workshop: Decoding the neural code »
Eric Thomson · Bill Kristan · Terrence Sejnowski 
2006 Poster: Implicit Surfaces with Globally Regularised and Compactly Supported Basis Functions »
Christian Walder · Bernhard Schölkopf · Olivier Chapelle 
2006 Poster: Learning Dense 3D Correspondence »
Florian Steinke · Bernhard Schölkopf · Volker Blanz 
2006 Poster: A Local Learning Approach for Clustering »
Mingrui Wu · Bernhard Schölkopf 
2006 Poster: A Kernel Method for the TwoSampleProblem »
Arthur Gretton · Karsten Borgwardt · Malte J Rasch · Bernhard Schölkopf · Alexander Smola 
2006 Poster: Greedy LayerWise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle 
2006 Poster: Correcting Sample Selection Bias by Unlabeled Data »
Jiayuan Huang · Alexander Smola · Arthur Gretton · Karsten Borgwardt · Bernhard Schölkopf 
2006 Spotlight: Correcting Sample Selection Bias by Unlabeled Data »
Jiayuan Huang · Alexander Smola · Arthur Gretton · Karsten Borgwardt · Bernhard Schölkopf 
2006 Talk: A Kernel Method for the TwoSampleProblem »
Arthur Gretton · Karsten Borgwardt · Malte J Rasch · Bernhard Schölkopf · Alexander Smola 
2006 Talk: Greedy LayerWise Training of Deep Networks »
Yoshua Bengio · Pascal Lamblin · Dan Popovici · Hugo Larochelle 
2006 Poster: A Nonparametric Approach to BottomUp Visual Saliency »
Wolf Kienzle · Felix A Wichmann · Bernhard Schölkopf · Matthias Franz 
2006 Poster: Learning with Hypergraphs: Clustering, Classification, and Embedding »
Denny Zhou · Jiayuan Huang · Bernhard Schölkopf