Workshop
Causal Representation Learning
Sara Magliacane · Atalanti Mastakouri · Yuki Asano · Claudia Shi · Cian Eastwood · Sébastien Lachapelle · Bernhard Schölkopf · Caroline Uhler
Room 243 - 245
Can we learn causal representations from raw data, e.g. images? This workshop connects research in causality and representation learning.
Schedule
Fri 6:15 a.m. - 6:20 a.m.
|
Introductory remarks
(
Talk
)
>
SlidesLive Video |
🔗 |
Fri 6:20 a.m. - 6:50 a.m.
|
Invited talk by Gemma Moran (Rutgers) - Identifiable representation learning via sparse decoding
(
Talk
)
>
SlidesLive Video |
Gemma Moran 🔗 |
Fri 6:50 a.m. - 7:20 a.m.
|
Invited talk by Xinwei Shen (ETH) - Extrapolation in Regression and Representation Learning
(
Talk
)
>
SlidesLive Video |
Xinwei Shen 🔗 |
Fri 7:20 a.m. - 7:35 a.m.
|
Identifying Effects of Disease on Single-Cells with Domain-Invariant Generative Modeling
(
Talk
)
>
SlidesLive Video |
Abdul Moeed 🔗 |
Fri 7:35 a.m. - 7:50 a.m.
|
Identifying Representations for Intervention Extrapolation
(
Talk
)
>
SlidesLive Video |
Sorawit Saengkyongam 🔗 |
Fri 7:50 a.m. - 8:05 a.m.
|
The Linear Representation Hypothesis in Language Models
(
Talk
)
>
SlidesLive Video |
Kiho Park 🔗 |
Fri 8:05 a.m. - 8:30 a.m.
|
Coffee break and Poster session setup
(
Break
)
>
|
🔗 |
Fri 8:30 a.m. - 10:00 a.m.
|
Poster session
(
Posters
)
>
|
🔗 |
Fri 10:00 a.m. - 11:30 a.m.
|
Lunch break (optionally cont. poster session)
(
Break
)
>
|
🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Invited talk by Chandler Squires (MIT) - Causal Imputation and Causal Disentanglement
(
Talk
)
>
SlidesLive Video |
Chandler Squires 🔗 |
Fri 12:00 p.m. - 12:30 p.m.
|
Invited talk by Dhanya Sridhar (MILA) - Properties of Representations for Causal Inference
(
Talk
)
>
SlidesLive Video |
Dhanya Sridhar 🔗 |
Fri 12:30 p.m. - 12:45 p.m.
|
Multi-View Causal Representation Learning with Partial Observability
(
Talk
)
>
SlidesLive Video |
Dingling Yao 🔗 |
Fri 12:45 p.m. - 1:00 p.m.
|
Score-based Causal Representation Learning from Interventions: Nonparametric Identifiability
(
Talk
)
>
link
SlidesLive Video |
Burak Varıcı 🔗 |
Fri 1:00 p.m. - 1:30 p.m.
|
Coffee break (optionally cont. poster session)
(
Break
)
>
|
🔗 |
Fri 1:30 p.m. - 2:00 p.m.
|
Invited talk by Francesco Locatello (ISTA) - Identifiability lessons learned scaling up causal discovery and causal representation learning
(
Talk
)
>
SlidesLive Video |
Francesco Locatello 🔗 |
Fri 2:00 p.m. - 2:30 p.m.
|
Invited talk by Julius von Kügelgen (MPI Tübingen) - Nonparametric Causal Representation Learning from Multiple Environments
(
Talk
)
>
SlidesLive Video |
Julius von Kügelgen 🔗 |
Fri 2:30 p.m. - 3:20 p.m.
|
Panel discussion
(
Panel
)
>
SlidesLive Video |
🔗 |
Fri 3:20 p.m. - 3:30 p.m.
|
Closing remarks
(
Talk
)
>
|
🔗 |
-
|
Learning Object Motion and Appearance Dynamics with Object-Centric Representations
(
Poster
)
>
link
Human perception involves discerning objects based on attributes such as size, color, and texture, and making predictions about their movements using features such as weight and speed. This innate ability operates without the need for conscious learning, allowing individuals to perform actions like catching or avoiding objects when they are unaware. Accordingly, the fundamental key to achieving higher-level cognition lies in the capability to break down intricate multi-object scenes into meaningful appearances. Object-centric representations have emerged as a promising tool for scene decomposition by providing useful abstractions. In this paper, we propose a novel approach to unsupervised video prediction leveraging object-centric representations. Our methodology introduces a two-component model consisting of a slot encoder for object-centric disentanglement and a feature extraction module for masked patches. These components are integrated through a cross-attention mechanism, allowing for comprehensive spatio-temporal reasoning. Our model exhibits better performance when dealing with intricate scenes characterized by a wide range of object attributes and dynamic movements. Moreover, our approach demonstrates scalability across diverse synthetic environments, thereby showcasing its potential for widespread utilization in vision-related tasks. |
Yeon-Ji Song · Hyunseo Kim · Suhyung Choi · Jin-Hwa Kim · Byoung-Tak Zhang 🔗 |
-
|
Attention for Causal Relationship Discovery from Biological Neural Dynamics
(
Poster
)
>
link
This paper explores the potential of the transformer models for causal representation learning in networks with complex nonlinear dynamics at every node, as in neurobiological and biophysical networks. Our study primarily focuses on a proof-of-concept investigation based on simulated neural dynamics, for which the ground-truth causality is known through the underlying connectivity matrix. For transformer models trained to forecast neuronal population dynamics, we show that the cross attention module effectively captures the causal relationship among neurons, with an accuracy equal or superior to that for the most popular causality discovery method. While we acknowledge that real-world neurobiology data will bring further challenges, including dynamic connectivity and unobserved variability, this research offers an encouraging preliminary glimpse into the utility of the transformer model for causal representation learning in neuroscience. |
Ziyu Lu · Anika Tabassum · Shruti Kulkarni · Lu Mi · Nathan Kutz · Eric Shea-Brown · Seung-Hwan Lim 🔗 |
-
|
Leveraging Low-Rank and Sparse Recurrent Connectivity for Robust Closed-Loop Control
(
Poster
)
>
link
Developing autonomous agents that can interact with changing environments is an open challenge in machine learning. Robustness is particularly important in these settings as agents are often fit offline on expert demonstrations but deployed online where they must generalize to the closed feedback loop within the environment. In this work, we explore the application of recurrent neural networks to tasks of this nature and understand how a parameterization of their recurrent connectivity influences robustness in closed-loop settings. Specifically, we represent the recurrent connectivity as a function of rank and sparsity and show both theoretically and empirically that modulating these two variables has desirable effects on network dynamics. The proposed low-rank, sparse connectivity induces an interpretable prior on the network that proves to be most amenable for a class of models known as closed-form continuous-time neural networks (CfCs). We find that CfCs with fewer parameters can outperform their full-rank, fully-connected counterparts in the online setting under distribution shift. This yields memory-efficient and robust agents while opening a new perspective on how we can modulate network dynamics through connectivity. |
Neehal Tumma · Mathias Lechner · Noel Loo · Ramin Hasani · Daniela Rus 🔗 |
-
|
What's your Use Case? A Taxonomy of Causal Evaluations of Post-hoc Interpretability
(
Poster
)
>
link
Post-hoc interpretability of Large Language Models (LLMs) often aims for mechanistic interpretations—detailed, causal accounts of model behavior. However, human interpreters may lack the capacity or willingness to formulate such intricate models, let alone evaluate them. This paper addresses this challenge by introducing a structured taxonomy grounded in the causal hierarchy. This taxonomy dissects the overarching goal of mechanistic interpretability into constituent claims, each requiring distinct evaluation methods. By doing so, we transform these evaluation criteria into actionable learning objectives, providing a data-driven pathway to interpretability. This framework enables a methodologically rigorous yet pragmatic approach to evaluating the strengths and limitations of various interpretability tools. |
David Reber · Victor Veitch 🔗 |
-
|
Learning Unknown Intervention Targets in Structural Causal Models from Heterogeneous Data
(
Poster
)
>
link
We study the problem of identifying the unknown intervention targets in structural causal models where we have access to heterogeneous data collected from multiple environments. The unknown intervention targets are the set of endogenous variables whose corresponding exogenous noises change across the environments. We propose a two-phase approach which in the first phase recovers the exogenous noises corresponding to unknown intervention targets whose distributions have changed across environments. In the second phase, the recovered noises are matched with the corresponding endogenous variables. For the recovery phase, we provide sufficient conditions for learning these exogenous noises up to somecomponent-wise invertible transformation. For the matching phase, under the causal sufficiency assumption, we show that the proposed method uniquely identifies the intervention targets. In the presence of latent confounders, the intervention targets among the observed variables cannot be determined uniquely. We provide a candidate intervention target set which is a superset of the true intervention targets. Our approach improves upon the state of the art as the returned candidate set is always a subset of the target set returned by previous work. Moreover, we do not require restrictive assumptions such as linearity of the causal model or performing invariance tests to learn whether a distribution is changing across environments which could be highly sample inefficient.Our experimental results show the effectivenessof our proposed algorithm in practice. |
Yuqin Yang · Saber Salehkaleybar · Negar Kiyavash 🔗 |
-
|
Learning Causally Disentangled Representations via the Principle of Independent Causal Mechanisms
(
Poster
)
>
link
Learning disentangled causal representations is a challenging problem that has gained significant attention recently due to its implications for extracting meaningful information for downstream tasks. In this work, we define a new notion of causal disentanglement from the perspective of independent causal mechanisms. We propose ICM-VAE, a framework for learning causally disentangled representations supervised by causally related observed labels. We model causal mechanisms using learnable flow-based diffeomorphic functions to map noise variables to latent causal variables. Further, to promote the disentanglement of causal factors, we propose a causal disentanglement prior that utilizes the known causal structure to encourage learning a causally factorized distribution in the latent space. Under relatively mild conditions, we provide theoretical results showing the identifiability of causal factors and mechanisms up to permutation and elementwise reparameterization. We empirically demonstrate that our framework induces highly disentangled causal factors, improves interventional robustness, and is compatible with counterfactual generation. |
Aneesh Komanduri · Yongkai Wu · Feng Chen · Xintao Wu 🔗 |
-
|
Towards Characterizing Domain Counterfactuals for Invertible Latent Causal Models
(
Poster
)
>
link
Answering counterfactual queries has many important applications such as knowledge discovery and explainability, but is challenging when causal variables are unobserved and we only see a projection onto an observation space, for instance, image pixels. One approach is to recover the latent Structural Causal Model (SCM), but this typically needs unrealistic assumptions, such as linearity of the causal mechanisms. Another approach is to use naïve ML approximations, such as generative models, to generate counterfactual samples; however, these lack guarantees of accuracy. In this work, we strive to strike a balance between practicality and theoretical guarantees by focusing on a specific type of causal query called domain counterfactuals, which hypothesizes what a sample would have looked like if it had been generated in a different domain (or environment). Concretely, by only assuming invertibility, sparse domain interventions and access to observational data from different domains, we aim to improve domain counterfactual estimation both theoretically and practically with less restrictive assumptions. We define domain counterfactually equivalent models and prove necessary and sufficient properties for equivalent models that provide a tight characterization of the domain counterfactual equivalence classes. Building upon this result, we prove that every equivalence class contains a model where all intervened variables are at the end when topologically sorted by the causal DAG. This surprising result suggests that a model design that only allows intervention in the last k latent variables may improve model estimation for counterfactuals. We then test this model design on extensive simulated and image-based experiments which show the sparse canonical model indeed improves counterfactual estimation over baseline non-sparse models. |
Sean Kulinski · Zeyu Zhou · Ruqi Bai · Murat Kocaoglu · David Inouye 🔗 |
-
|
SCADI: Self-supervised Causal Disentanglement in Latent Variable Models
(
Poster
)
>
link
Causal disentanglement has great potential for capturing complex situations. However, there is a lack of practical and efficient approaches. It is already known that most unsupervised disentangling methods are unable to produce identifiable results without additional information, often leading to randomly disentangled output. Therefore, most existing models for disentangling are weakly supervised, providing information about intrinsic factors, which incurs excessive costs. Therefore, we propose a novel model, SCADI(SElf-supervised CAusal DIsentanglement), that enables the model to discover semantic factors and learn their causal relationships without any supervision. This model combines a masked structural causal model (SCM) with a pseudo-label generator for causal disentanglement, aiming to provide a new direction for self-supervised causal disentanglement models. |
Heejeong Nam 🔗 |
-
|
Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention
(
Poster
)
>
link
Visual reasoning is supported by a causal understanding of the physical world, and theories of human cognition suppose that a necessary step to causal understanding is the discovery and representation of high-level entities like objects. Slot Attention is a popular method aimed at object-centric learning, and its popularity has resulted in dozens of variants and extensions. To help understand the core assumptions that lead to successful object-centric learning, we take a step back and identify the minimal set of changes to a standard Transformer architecture to obtain the same performance as the specialized Slot Attention models. We systematically evaluate the performance and scaling behaviour of several "intermediate" architectures on seven image and video datasets from prior work. Our analysis reveals that by simply inverting the attention mechanism of Transformers, we obtain performance competitive with state-of-the-art Slot Attention in several domains. |
Yi-Fu Wu · Klaus Greff · Gamaleldin Elsayed · Michael Mozer · Thomas Kipf · Sjoerd van Steenkiste 🔗 |
-
|
Triangular Monotonic Generative Models Can Perform Causal Discovery
(
Poster
)
>
link
Many causal discovery algorithms exploit conditional independence signatures in observational data, recovering a Markov equivalence class (MEC) of possible graphs consistent with the data. In case the MEC is non-trivial, additional assumptions on the data generating process can be made, and generative models can be fit to further resolve the MEC. We show that triangular monotonic increasing (TMI) maps parametrize generative models that perform conditional independence-based causal discovery by searching over permutations, that additionally are flexible enough as generative models to fit a wide class of causal models. In this paper, we characterize the theoretical properties that make these models relevant as tools for causal discovery, make connections to existing methods, and highlight open challenges towards their deployment. |
Quanhan (Johnny) Xi · Sebastian Gonzalez · Benjamin Bloem-Reddy 🔗 |
-
|
Local Discovery by Partitioning: Polynomial-Time Causal Discovery Around Exposure-Outcome Pairs
(
Poster
)
>
link
This work addresses the problem of automated covariate selection under limited prior knowledge. Given an exposure-outcome pair {X,Y} and a variable set Z of unknown causal structure, the Local Discovery by Partitioning (LDP) algorithm partitions Z into subsets defined by their relation to {X,Y}. We enumerate eight exhaustive and mutually exclusive partitions of any arbitrary Z and leverage this taxonomy to differentiate confounders from other variable types. LDP is motivated by valid adjustment set identification, but avoids the pretreatment assumption commonly made by automated covariate selection methods. We provide theoretical guarantees that LDP returns a valid adjustment set for any Z that meets sufficient graphical conditions. Under stronger conditions, we prove that partition labels are asymptotically correct. Total independence tests is worst-case quadratic in |Z|, with sub-quadratic runtimes observed empirically. We numerically validate our theoretical guarantees on synthetic and semi-synthetic graphs. Adjustment sets from LDP yield less biased and more precise average treatment effect estimates than baselines, with LDP outperforming on confounder recall, test count, and runtime for valid adjustment set discovery. |
Jacqueline Maasch · Weishen Pan · Shantanu Gupta · Volodymyr Kuleshov · Kyra Gan · Fei Wang 🔗 |
-
|
Towards representation learning for general weighting problems in causal inference
(
Poster
)
>
link
Weighting problems in treatment effect estimation can be solved by minimising an appropriate probability distance. However, choosing which distance to minimise is uneasy as it depends on the unknown data generating process (DGP). A workaround consists in choosing a distance depending on a suitable representation of covariates. In this work, we give errors that quantify how much bias is added to the weighting estimator when using a representation, giving clear objectives to minimise when learning the representation and generalising a large body of previous work on deconfounding, prognostic, balancing and propensity scores. We further outline a method minimising such objectives, and show promising numerical results on a high-dimensional dataset. |
Oscar Clivio · Avi Feller · Chris C Holmes 🔗 |
-
|
Exploiting Causal Representations in Reinforcement Learning: A Posterior Sampling Approach
(
Poster
)
>
link
Posterior sampling allows the exploitation of prior knowledge of the environment's transition dynamics to improve the sample efficiency of reinforcement learning. The prior is typically specified as a class of parametric distributions, a task that can be cumbersome in practice, often resulting in the choice of uninformative priors. Instead, in this work we study how to exploit causal representations to build priors that are often more natural to design. Specifically, we propose a novel hierarchical posterior sampling approach, called C-PSRL, in which the prior is given as a (partial) causal graph over the environment's causal variables, such as listing known causal dependencies between biometric features in a medical treatment study. C-PSRL simultaneously learns a graph consistent with the true causal graph at the higher level and the parameters of the resulting factored dynamics at the lower level. For this procedure, we provide an analysis of its Bayesian regret, which explicitly connects the regret rate with the degree of causal knowledge, and we show how regret minimization leads to a weak notion of causal discovery. |
Mirco Mutti · Riccardo De Santi · Marcello Restelli · Alexander Marx · Giorgia Ramponi 🔗 |
-
|
Identifying Effects of Disease on Single-Cells with Domain-Invariant Generative Modeling
(
Oral
)
>
link
A core challenge in computational biology is predicting the effects of disease on healthy tissue. From the machine learning perspective, effects of disease and other stimulations on gene expression of single cells can be modeled as a domain shift in a low-dimensional latent space applied to healthy cells. Guided by principles of domain-invariance and compositional models, we present "single-cell Domain Shift Autoencoder (scDSA)", a deep generative model for disentangling disease-invariant and disease-specific gene programs at single-cell resolution. scDSA uncovers latent factors that are conserved across healthy and disease cell states, and learns how these factors interact with disease. We show that our model i) predicts counterfactual healthy cell-types of diseased cells in unseen patients, ii) captures interpretable representations of disease(s), and iii) learns interaction of disease effects and cell-types. scDSA helps to further our understanding of how diseases perturb healthy tissue on a patient-specific basis therefore enabling advances in personalized healthcare. |
Abdul Moeed · Martin Rohbeck · Pavlo Lutsik · Kai Ueltzhoeffer · Marc Jan Bonder · Oliver Stegle 🔗 |
-
|
Learning Endogenous Representation in Reinforcement Learning via Advantage Estimation
(
Poster
)
>
link
Recently, it was shown that the advantage function can be understood as quantifying the causal effect of an action on the cumulative reward. However, this connection remained largely analogical, with unclear implications. In the present work, we examine this analogy using the Exogenous Markov Decision Process (ExoMDP) framework, which factorizes an MDP into variables that are causally related to the agent's actions (endogenous) and variables that are beyond the agent's control (exogenous). We demonstrate that the advantage function can be expressed using only the endogenous variables, which is, in general, not possible for the (action-)value function. Through experiments in a toy ExoMDP, we found that estimating the advantage function directly can facilitate learning representations that are invariant to the exogenous variables. |
Hsiao-Ru Pan · Bernhard Schölkopf 🔗 |
-
|
Causal Regressions For Unstructured Data
(
Poster
)
>
link
The focus of much recent research in economics and marketing has been (1) to allow for unstructured data in causal studies and (2) to flexibly address the issue of endogeneity withobservational data and perform valid causal inference. Directly using machine learning algorithms to predict the outcomevariable can help deal with the issue of unstructured data; however, it is well knownthat such an approach does not perform well in the presence of endogeneity in theexplanatory variables. On the other hand, extant methods catered towards addressing endogeneity issues make strong parametric assumptions and hence are incapable of“directly" incorporating high-dimensional unstructured data. In this paper, we propose an estimator,which we term “RieszIV" for carrying out estimation and inference with high-dimensional observational datawithout resorting to parametric approximations. We demonstrate our estimator exhibits asymptotic consistency and normality under a mild set of conditions. We carryout extensive Monte Carlo simulations with both low-dimensional and high-dimensionalunstructured data to demonstrate the finite sample performance of our estimator. Finally, using app downloads and review data for apps on Google Play we demonstrate how our method can be used to conduct inference over counterfactual policies over rich text data. We show how large language models can be used as a viable counterfactual policy generation operator. This represents an important advance in expanding counterfactual inference to complex, real-world settings. |
Amandeep Singh · Bolong Zheng 🔗 |
-
|
Expediting Reinforcement Learning by Incorporating Temporal Causal Information
(
Poster
)
>
link
Reinforcement learning (RL) algorithms struggle with learning optimal policies for tasks where reward feedback is sparse and depends on a complex sequence of events in the environment. Probabilistic reward machines (PRMs) are finite-state formalisms that can capture temporal dependencies in the reward signal, along with nondeterministic task outcomes. While special RL algorithms can exploit this finite-state structure to expedite learning, PRMs remain difficult to modify and design by hand. This hinders the already difficult tasks of utilizing high-level causal knowledge about the environment, and transferring the reward formalism into a new domain with a different causal structure. This paper proposes a novel method to incorporate causal information in the form of Temporal Logic-based Causal Diagrams into the reward formalism, thereby expediting policy learning and aiding the transfer of task specifications to new environments. |
Jan Corazza · Daniel Neider · Zhe Xu · Hadi Partovi Aria 🔗 |
-
|
DISK: Domain Inference for Discovering Spurious Correlation with KL-Divergence
(
Poster
)
>
link
Existing methods utilize domain information to address the subpopulation shift issue and enhance model generalization. However, the availability of domain information is not always guaranteed. In response to this challenge, we introduce a novel end-to-end method called DISK. DISK discovers the spurious correlations present in the training and validation sets through KL-divergence and assigns spurious labels (which are also the domain labels) to classify instances based on spurious features. By combining spurious labels $y_s$ with true labels $y$, DISK effectively partitions the data into different groups with unique data distributions $\mathbb{P}(\mathbf{x}|y,y_s)$. The group partition inferred by DISK then can be seamlessly leveraged to design algorithms to further mitigate the subpopulation shift and improve generalization on test data. Unlike existing domain inference methods, such as ZIN and DISC, DISK reliably infers domains without requiring additional information. We extensively evaluated DISK on different datasets, considering scenarios where validation labels are either available or unavailable, demonstrating its effectiveness in domain inference and mitigating subpopulation shift. Furthermore, our results also suggest that for some complex data, the neural network-based DISK may have the potential to perform more reasonable domain inferences, which highlights the potential effective integration of DISK and human decisions when the (human-defined) domain information is available. Codes of DISK are available at [https://anonymous.4open.science/r/DISK-E23A/](https://anonymous.4open.science/r/DISK-E23A/).
|
Yujin Han · Difan Zou 🔗 |
-
|
A Sparsity Principle for Partially Observable Causal Representation Learning
(
Poster
)
>
link
Causal representation learning (CRL) aims at identifying high-level causal variables from low-level data, e.g. images. Current methods usually assume that all causal variables are captured in the high-dimensional observations. In this work, we focus on learning causal representations from data under partial observability, i.e., when some of the causal variables are not observed in the measurements, and the set of masked variables changes across the different samples. We introduce some initial theoretical results for identifying causal variables under partial observability by exploiting a sparsity regularizer, focusing in particular on the linear and piecewise linear mixing function case. We provide a theorem that allows us to identify the causal variables up to permutation and element-wise linear transformations in the linear case and a lemma that allows us to identify causal variables up to linear transformation in the piecewise case. Finally, we provide a conjecture that would allow us to identify the causal variables up to permutation and element-wise linear transformations also in the piecewise linear case.We test the theorem and conjecture on simulated data, showing the effectiveness of our method. |
Danru Xu · Dingling Yao · Sébastien Lachapelle · Perouz Taslakian · Julius von Kügelgen · Francesco Locatello · Sara Magliacane 🔗 |
-
|
Mixup-Based Knowledge Distillation with Causal Intervention for Multi-Task Speech Classification
(
Poster
)
>
link
Speech classification is an essential yet challenging subtask of multitask classification, which determines the gender and age groups of speakers. Existing methods face challenges while extracting the correct features indicative of some age groups that have several ambiguities of age perception in speech. Furthermore, the methods cannot fully understand the causal inferences between speech representation and multilabel spaces. In this study, the causes of ambiguous age group boundaries are attributed to the considerable variability in speech, even within the same age group. Additionally, features that indicate speech from the 20’s can be shared by some age groups in their 30’s. Therefore, a two-step approach to (1) mixup-based knowledge distillation to remove biased knowledge with causal intervention and (2) hierarchical multi-task learning with causal inference for the age group hierarchy to utilize the shared information of label dependencies is proposed. Empirical experiments on Korean open-set speech corpora demonstrate that the proposed methods yield a significant performance boost in multitask speech classification. |
Kwangje Baeg · Hyeopwoo Lee · Yeomin Yoon · Jongmo Kim 🔗 |
-
|
Hierarchical Causal Representation Learning
(
Poster
)
>
link
Learning causal representations is a crucial step toward understanding and reasoning about an agent's actions in embodied AI and reinforcement learning. In many scenarios, an intelligent agent starts learning to interact with an environment by initially performing coarse actions with multiple simultaneous effects. During the learning process, the agent starts acquiring more fine-grained skills that can now affect only some of the factors in the environment. This setting is currently underexplored in current causal representation learning methods that typically learn a single causal representation and do not reuse or refine previously learned representations. In this paper, we introduce the problem of hierarchical causal representation learning, which leverages causal representations learned with coarse interactions and progressively refines them, as more fine-grained interactions become available.We propose HERCULES, a method that builds a hierarchical structure where at each level it gradually identifies more fine-grained causal variables by leveraging increasingly refined interventions. In experiments on two benchmarks of sequences of images with intervened causal factors, we demonstrate that HERCULES successfully recovers the causal factors of the underlying system and outperforms current state-of-the-art methods in scenarios with limited fine-grained data. At the same time, the acquired representations of HERCULES exhibit great adaptation capabilities under local transformations of the causal factors. |
Angelos Nalmpantis · Phillip Lippe · Sara Magliacane 🔗 |
-
|
The Linear Representation Hypothesis in Language Models
(
Oral
)
>
link
In the context of large language models, the "linear representation hypothesis" is the idea that high-level concepts are represented linearly as directions in a representation space. If the hypothesis were true, we might hope to interpret model representations by computing their concept directions or control model behavior by intervening on representations using those directions. In this paper, we formalize the linear representation hypothesis in terms of counterfactual pairs and connect this formalism to other notions of the hypothesis, including measurement (via linear probes) and intervention (control). Then, we empirically demonstrate the existence of linear concept directions in the LLaMA-2 model and show how the different notions of the hypothesis manifest in modern LLMs. |
Kiho Park · Yo Joong Choe · Victor Veitch 🔗 |
-
|
Debiasing Multimodal Models via Causal Information Minimization
(
Poster
)
>
link
Most existing debiasing methods for multimodal models, including causal intervention and inference methods, utilize approximate heuristics to represent the biases, such as shallow features from early stages of training or unimodal features for multimodal tasks like VQA, etc., which may not be accurate. In this paper, we study bias arising from confounders in a causal graph for multimodal data, and examine a novel approach that leverages causally-motivated information minimization to learn the confounder representations. Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data. Hence, minimizing the information content of features obtained from a pretrained biased model helps learn the simplest predictive features that capture the underlying data distribution. We treat these features as confounder representations and use them via methods motivated by causal theory to remove bias from models. We find that the learned confounder representations indeed capture dataset biases and the proposed debiasing methods improve out-of-distribution (OOD) performance on multiple multimodal datasets without sacrificing in-distribution performance. |
Vaidehi Patil · Adyasha Maharana · Mohit Bansal 🔗 |
-
|
Unfairness Detection within Power Systems through Transfer Counterfactual Learning
(
Poster
)
>
link
Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges. |
Song Wei · Xiangrui Kong · Sarah Huestis-Mitchell · Yao Xie · Shixiang Zhu · Alinson Xavier · Feng Qiu 🔗 |
-
|
Choice Models and Permutation Invariance: Demand Estimation in Differentiated Products Markets
(
Poster
)
>
link
Choice Modeling is at the core of many economics, operations, and marketing problems. In this paper, we propose a fundamental characterization of choice functions that encompasses a wide variety of extant choice models. We demonstrate how non-parametric estimators like neural nets can easily approximate such functionals and overcome the curse of dimensionality that is inherent in the non-parametric estimation of choice functions. We demonstrate through extensive simulations that our proposed functionals can flexibly capture underlying consumer behavior in a completely data-driven fashion and outperform traditional parametric models. As demand settings often exhibit endogenous features, we extend our framework to incorporate estimation under endogenous features. Further, we also describe a formal inference procedure to construct valid confidence intervals on objects of interest like price elasticity. Finally, to assess the practical applicability of our estimator, we utilize a real-world dataset from \cite{berry1995automobile}. Our empirical analysis confirms that the estimator generates realistic and comparable own- and cross-price elasticities that are consistent with the observations reported in the existing literature. |
Amandeep Singh · Ye Liu · Hema Yoganarasimhan 🔗 |
-
|
Independent Mechanism Analysis and the Manifold Hypothesis: Identifiability and Genericity
(
Poster
)
>
link
Independent Mechanism Analysis (IMA) seeks to address non-identifiability in nonlinear ICA by assuming that the Jacobian of the mixing function has orthogonal columns. Previous research focused on the case with equal numbers of latent components and observed mixtures, as typical in ICA. In this work, we extend IMA to model mixtures residing on a manifold within a higher-dimensional space than the latent space---in line with the manifold hypothesis in representation learning. We show that IMA circumvents several non-identifiability issues arising in this setting, suggesting that it can be beneficial even when the manifold hypothesis holds. Moreover, we prove that the IMA principle is approximately satisfied when the directions along which the latent components influence the observations are chosen independently, with probability increasing with the observed space dimensionality. This provides a new and rigorous statistical interpretation of IMA. |
Shubhangi Ghosh · Luigi Gresele · Julius von Kügelgen · Michel Besserve · Bernhard Schölkopf 🔗 |
-
|
Cells2Vec: Bridging the gap between experiments and simulations using causal representation learning
(
Poster
)
>
link
Calibration of computational simulations of biological dynamics against experimental observations is often a challenge. In particular, the selection of features that can be used to construct a goodness-of-fit function for agent-based models of spatiotemporal behaviour can be difficult (Yip et al. (2022)). In this study, we generate one-dimensional embeddings of high-dimensional simulation outputs using causal dilated convolutions for encoding and a triplet loss-based training strategy. We verify the robustness of the trained encoder using simulations generated by unseen input parameter sets. Furthermore, we use the generated embeddings to estimate the parameters of simulations using XGBoost Regression. We demonstrate the results of parameter estimation for corresponding time-series experimental observations. Our regression approach is able to estimate simulation parameters with an average $R^2$ metric of 0.90 for model runs with embedding dimensions of 4,8,12 and 16. Model calibration led to simulations with an average cosine similarity agreement of 0.95 with experiments over multiple model runs for cross-validation.
|
Dhruva Rajwade · Atiyeh Ahmadi · Brian Ingalls 🔗 |
-
|
Score-based Causal Representation Learning from Interventions: Nonparametric Identifiability
(
Oral
)
>
link
This paper focuses on causal representation learning (CRL) under a general nonparametric causal latent model and a general transformation model mapping the latent data to the observational data. It establishes identifiability and achievability results under two hard interventions per node in the latent causal graph, and one does not know which pair of environments have the same node intervened (uncoupled environments). Specifically, for identifiability, it is shown that perfect recovery of the latent causal model and variables is guaranteed under these conditions. For achievability, an algorithm is designed that uses observational data and two interventional environments per node and recovers the latent causal model and variables. This algorithm leverages score variations across different environments to estimate the inverse of the transformer and, subsequently, the latent variables. Our analysis also recovers the existing identifiability result for two hard interventions when metadata about the pair of environments that have the same node intervened is known (coupled environments). The existing results on non-parametric identifiability require assumptions on interventions and additional faithfulness assumptions. This paper shows that when observational data is available, additional assumptions on faithfulness are not necessary. |
Burak Varıcı · Emre Acartürk · Karthikeyan Shanmugam · Ali Tajer 🔗 |
-
|
Multi-Domain Causal Representation Learning via Weak Distributional Invariances
(
Poster
)
>
link
Causal representation learning has emerged as the center of action in causal machine learning research. In particular, multi-domain datasets present a natural opportunity for showcasing the advantages of causal representation learning over standard unsupervised representation learning. While recent works have taken crucial steps towards learning causal representations, they often lack applicability to multi-domain datasets due to over-simplifying assumptions about the data; e.g. each domain comes from a different single-node perfect intervention. In this work, we relax these assumptions and capitalize on the following observation: there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains (e.g., when each domain comes from a multi-node imperfect intervention). Leveraging this observation, we show that autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest in a host of different settings. |
Kartik Ahuja · Amin Mansouri · Yixin Wang 🔗 |
-
|
Counterfactual Generative Models for Time-Varying Treatments
(
Poster
)
>
link
Estimating the counterfactual outcome of treatment is essential for decision-making in public health and clinical science, among others. Often, treatments are administered in a sequential, time-varying manner, leading to an exponentially increased number of possible counterfactual outcomes. Furthermore, in modern applications, the outcomes are high-dimensional and conventional average treatment effect estimation fails to capture disparities in individuals. To tackle these challenges, we propose a novel conditional generative framework capable of producing counterfactual samples under time-varying treatment, without the need for explicit density estimation. Our method carefully addresses the distribution mismatch between the observed and counterfactual distributions via a loss function based on inverse probability weighting. We present a thorough evaluation of our method using both synthetic and real-world data. Our results demonstrate that our method is capable of generating high-quality counterfactual samples and outperforms the state-of-the-art baselines. |
Shenghao Wu · Wenbin Zhou · Minshuo Chen · Shixiang Zhu 🔗 |
-
|
Self-Supervised Disentanglement by Leveraging Structure in Data Augmentations
(
Poster
)
>
link
Self-supervised representation learning often uses data augmentations to induce some invariance to "style" attributes of the data. However, with downstream tasks generally unknown at training time, it is difficult to deduce a priori which attributes of the data are indeed "style" and can be safely discarded. To address this, we introduce a more principled approach that seeks to disentangle style features rather than discard them. The key idea is to add multiple style embedding spaces where: (i) each is invariant to all-but-one augmentation; and (ii) joint entropy is maximized. We formalize our structured data-augmentation procedure from a causal latent-variable-model perspective, and prove identifiability of both content and (multiple blocks of) style variables. We empirically demonstrate the benefits our approach on synthetic datasets and then present promising but limited results on ImageNet. |
Cian Eastwood · Julius von Kügelgen · Linus Ericsson · Diane Bouchacourt · Pascal Vincent · Mark Ibrahim · Bernhard Schölkopf 🔗 |
-
|
Object-Centric Semantic Vector Quantization
(
Poster
)
>
link
Neural discrete representations are crucial components of modern neural networks. However, their main limitation is that the primary strategies such as VQ-VAE can only provide representations at the patch level. Therefore, one of the main goals of representation learning, acquiring conceptual, semantic, and compositional abstractions such as the color and shape of an object, remains elusive. In this paper, we present the first approach to semantic neural discrete representation learning. The proposed model, called Semantic Vector-Quantized Variational Autoencoder (SVQ), leverages recent advances in unsupervised object-centric learning to address this limitation. Specifically, we observe that a simple approach quantizing at the object level poses a significant challenge and propose constructing scene representations hierarchically, from low-level discrete concept schemas to object representations. Additionally, we suggest a novel method for training a prior over these semantic representations, enabling the ability to generate images following the underlying data distribution, which is lacking in most object-centric models. In experiments on various 2D and 3D object-centric datasets, we find that our model achieves superior generation performance compared to non-semantic vector quantization methods such as VQ-VAE and previous object-centric generative models. Furthermore, we find that the semantic discrete representations can solve downstream scene understanding tasks that require reasoning about the properties of different objects in the scene. |
Yi-Fu Wu · Minseung Lee · Sungjin Ahn 🔗 |
-
|
Towards the Reusability and Compositionality of Causal Representations
(
Poster
)
>
link
Causal Representation Learning (CRL) aims at identifying high-level causal factors and their relationships from high-dimensional observations, e.g., images. While most CRL works focus on learning causal representations in a single environment, in this work we instead propose a first step towards learning causal representations from temporal sequences of images that can be adapted in a new environment, or composed across multiple related environments. In particular, we introduce DECAF, a framework that detects which causal factors can be reused and which need to be adapted from previously learned causal representations. Our approach is based on the availability of intervention targets, that indicate which variables are perturbed at each time step.Experiments on three benchmark datasets show that integrating our framework with four state-of-the-art CRL approaches leads to accurate representations in a new environment with only a few samples. |
Davide Talon · Phillip Lippe · Stuart James · Alessio Del Bue · Sara Magliacane 🔗 |
-
|
Object-centric architectures enable efficient causal representation learning
(
Poster
)
>
link
Causal representation learning has showed a variety of settings in which we can disentangle latent variables with identifiability guarantees (up to some reasonable equivalence class). Common to all of these approaches is the assumption that (1) the latent variables are represented as $d-$dimensional vectors, and (2) that the observations are the output of some injective generative function of these latent variables. While these assumptions appear benign, we show that when the observations are of multiple objects, the generative function is no longer injective and disentanglement fails in practice. We can address this failure by combining recent developments in object-centric learning and causal representation learning. By modifying the Slot Attention architecture (Locatello et al., 2020), we develop an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties. This approach is more data-efficient in the sense that it requires significantly fewer perturbations than a comparable approach that encodes to a Euclidean space and we show that this approach successfully disentangles the properties of a set of objects in a series of simple image-based disentanglement experiments.
|
Amin Mansouri · Jason Hartford · Yan Zhang · Yoshua Bengio 🔗 |
-
|
Reward-Relevance-Filtered Linear Offline Reinforcement Learning
(
Poster
)
>
link
This paper studies causal variable selection in the setting of a Markov decision process, specifically offline reinforcement learning with linear function approximation. The structural restrictions of the data-generating process presume that the transitions factor into sparse dynamics that affect the reward, and additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function is sparse. This is a novel "causal sparsity" notion that does not occur in pure estimation settings. We develop methods for filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso: we use thresholded lasso to recover the support of the rewards, and use this estimated support to estimate the state-action $Q$ function. Such a method has sample complexity depending only on the size of the sparse component. Although this problem differs from the typical statement of "causal representation learning", this notion of "causal sparsity" may be of interest, and our methods connect to a classical statistical literature with theoretical guarantees that can be a stepping stone for more complex representation learning.
|
Angela Zhou 🔗 |
-
|
Putting Causal Identification to the Test: Falsification using Multi-Environment Data
(
Poster
)
>
link
We study the problem of falsifying the assumptions behind a set of broadly applied causal identification strategies: namely back-door adjustment, front-door adjustment, and instrumental variable estimation. While these assumptions are untestable from observational data in general, we show that with access to data coming from multiple heterogeneous environments, there exist novel independence constraints that can be used to falsify the validity of each strategy. Most interestingly, we make no parametric assumptions, instead relying on that changes between environments happen under the principle of independent causal mechanisms. |
Rickard Karlsson · Ștefan Creastă · Jesse Krijthe 🔗 |
-
|
Multi-View Causal Representation Learning with Partial Observability
(
Oral
)
>
link
We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of views can be learned up to a smooth bijection using contrastive learning and a single encoder per view. We also provide graphical criteria indicating which latent variables can be identified through a simple set of rules, which we refer to as identifiability algebra. Our general framework and theoretical results unify and extend several previous works on multi-view nonlinear ICA, disentanglement, and causal representation learning. We experimentally validate our claims on numerical, image, and multi-modal data sets. Further, we demonstrate that the performance of prior methods is recovered in different special cases of our setup. Overall, we find that access to multiple partial views enables identifying a more fine-grained representation, under the generally milder assumption of partial observability. |
Dingling Yao · Danru Xu · Sébastien Lachapelle · Sara Magliacane · Perouz Taslakian · Georg Martius · Julius von Kügelgen · Francesco Locatello 🔗 |
-
|
Invariance & Causal Representation Learning: Prospects and Limitations
(
Poster
)
>
link
In causal models, a given mechanism is assumed to be invariant to changes of other mechanisms. While this principle has been utilized for inference in settings where the causal variables are observed, theoretical insights when the variables of interest are latent are largely missing. We assay the connection between invariance and causal representation learning by establishing impossibility results which show that invariance alone is insufficient to identify latent causal variables. Together with practical considerations, we use these theoretical findings to highlight the need for additional constraints in order to identify representations by exploiting invariance. |
Simon Bing · Jonas Wahl · Urmi Ninad · Jakob Runge 🔗 |
-
|
Curvature and Causal Inference in Network Data
(
Poster
)
>
link
Learning causal mechanisms involving networked units of data is a notoriously challenging task with various applications. Graph Neural Networks (GNNs) have proven to be effective for learning representations that capture complex dependencies between data units. This effectiveness is largely due to the conduciveness of GNNs to tools that characterize the geometry of graphs. The potential of geometric deep learning for GNN-based causal representation learning, however, remains underexplored. This work makes three key contributions to bridge this gap. First, we establish a theoretical connection between graph curvature and causal inference, showing that negative curvatures pose challenges to learning the causal mechanisms underlying network data. Second, based on this theoretical insight, we present empirical results using the Ricci curvature to gauge the error in treatment effect estimates made from representations learned by GNNs. This empirically demonstrates that positive curvature regions yield more accurate results. Lastly, as an example of the potentials unleashed by this newfound connection between geometry and causal inference, we propose a method using Ricci flow to improve the treatment effect estimation on networked data. Our experiments confirm that this method reduces the error in treatment effect estimates by flattening the network, showcasing the utility of geometric methods for enhancing causal representation learning. Our findings open new avenues for leveraging discrete geometry in causal representation learning, offering insights and tools that enhance the performance of GNNs in learning robust structural relationships. |
Amirhossein Farzam · Allen Tannenbaum · Guillermo Sapiro 🔗 |
-
|
Causal Modeling with Stationary Diffusions
(
Poster
)
>
link
We develop a new model and learning algorithm for causal inference. Rather than structural equations over a causal graph, we learn stochastic differential equations (SDEs) whose stationary densities model a system's behavior under interventions. These stationary diffusion models do not require the formalism of causal graphs, let alone the common assumption of acyclicity. We show that in several cases, they still allow generalizing to unseen interventions on their variables, often better than classical approaches. Our inference method is based on a novel theoretical result that translates a stationarity condition on the diffusion's generator into reproducing kernel Hilbert spaces. The resulting kernel deviation from stationarity (KDS) is an objective function of independent interest. |
Lars Lorch · Andreas Krause · Bernhard Schölkopf 🔗 |
-
|
Learning to ignore: Single Source Domain Generalization via Oracle Regularization
(
Poster
)
>
link
Machine learning frequently suffers from the discrepancy in data distribution, commonly known as domain shift. Single-source Domain Generalization (sDG) is a task designed to simulate domain shift artificially, in order to train a model that can generalize well to multiple unseen target domains from a single source domain. A popular approach is to learn robustness via the alignment of augmented samples. However, prior works frequently overlooked what is learned from such alignment. In this paper, we study the effectiveness of augmentation-based sDG methods via a causal interpretation of the data generating process. We highlight issues in using augmentation for generalization, namely, the distinction between domain invariance and augmentation invariance. To alleviate these issues, we introduce a novel regularization method that leverages pretrained models to guide the learning process via a feature-level regularization, which we name PROF (Progressive mutual information Regularization for Online distillation of Frozen oracles). PROF can be applied to conventional augmentation-based methods to moderate the impact of stochasticity in models repeatedly trained on augmented data, encouraging the model to learn domain-invariant representations. We empirically show that PROF stabilizes the learning process for sDG. |
Dong Kyu Cho · Sanghack Lee 🔗 |
-
|
Instance-Dependent Partial Label Learning with Identifiable Causal Representations
(
Poster
)
>
link
Partial label learning (PLL) deals with the problem where each training example is annotated with a set of candidate labels, among which only one is true. In real-world scenarios, the candidate labels are generally dependent to the instance features. However, existing PLL methods focus solely on classification accuracy, whereas the possibility of exploiting the dependency for causal representation learning remains unexplored. In this paper, we investigate the causal representation identifiability under the PLL paradigm and propose a novel framework which learns identifiable latent factors up to permutation, scaling and translation. Qualitative and quantitative experiments confirmed the effectiveness of this approach. |
Yizhi Wang · Weijia Zhang · Min-Ling Zhang 🔗 |
-
|
Causal Markov Blanket Representations for Domain Generalization Prediction
(
Poster
)
>
link
The pursuit of generalizable representations in the realm of machine learning and computer vision is a dynamic field of research. Typically, current methods aim to secure invariant representations by either harnessing domain expertise or leveraging data from multiple domains. In this paper, we introduce a novel approach that involves acquiring Causal Markov Blanket (CMB) representations to improve prediction performance in the face of distribution shifts. Causal Markov Blanket representations comprise the direct causes and effects of the target variable, rendering them invariant across diverse domains. To elaborate, our approach commences with the introduction of a novel structural causal model (SCM) equipped with latent representations, designed to capture the underlying causal mechanisms governing the data generation process. Subsequently, we propose a CMB representation learning framework that derives representations conforming to the proposed SCM. In comparison to state-of-the-art domain generalization methods, our approach exhibits robustness and adaptability under distribution shifts. |
Naiyu Yin · Hanjing Wang · Tian Gao · Amit Dhurandhar · Qiang Ji 🔗 |
-
|
Identifying Representations for Intervention Extrapolation
(
Oral
)
>
link
The premise of identifiable and causal representation learning is to improve the current representation learning paradigm in terms of generalizability or robustness. Despite recent progress in questions of identifiability, more theoretical results demonstrating concrete advantages of these methods for downstream tasks are needed. In this paper, we consider the task of intervention extrapolation: predicting how interventions affect an outcome, even when those interventions are not observed at training time, and show that identifiable representations can provide an effective solution to this task even if the interventions affect the outcome non-linearly. Our setup includes an outcome variable $Y$, observed features $X$, which are generated as a non-linear transformation of latent features $Z$, and exogenous action variables $A$, which influence $Z$. The objective of intervention extrapolation is then to predict how interventions on $A$ that lie outside the training support of $A$ affect $Y$. Here, extrapolation becomes possible if the effect of $A$ on $Z$ is linear and the residual when regressing Z on A has full support. As $Z$ is latent, we combine the task of intervention extrapolation with identifiable representation learning, which we call $\texttt{Rep4Ex}$: we aim to map the observed features $X$ into a subspace that allows for non-linear extrapolation in $A$. We show using Wiener’s Tauberian theorem that the hidden representation is identifiable up to an affine transformation in $Z$-space, which, we prove, is sufficient for intervention extrapolation. The identifiability is characterized by a novel constraint describing the linearity assumption of $A$ on $Z$. Based on this insight, we propose a flexible method that enforces the linear invariance constraint and can be combined with any type of autoencoder. We validate our theoretical findings through a series of synthetic experiments and show that our approach can indeed succeed in predicting the effects of unseen interventions.
|
Sorawit Saengkyongam · Elan Rosenfeld · Pradeep Ravikumar · Niklas Pfister · Jonas Peters 🔗 |
-
|
Learning Causally-Aware Representations of Multi-Agent Interactions
(
Poster
)
>
link
Modeling spatial-temporal interactions between neighboring agents is at the heart of multi-agent problems such as motion forecasting and crowd navigation. Despite notable progress, it remains unclear to which extent modern representations can capture the causal relationships behind agent interactions. In this work, we take an in-depth look at the causal awareness of the learned representations, from computational formalism to controlled simulations to real-world practice. First, we cast doubt on the notion of non-causal robustness studied in the recent CausalAgents benchmark. We show that recent representations are already partially resilient to perturbations of non-causal agents, and yet modeling indirect causal effects involving mediator agents remains challenging. Further, we introduce a simple but effective regularization approach leveraging causal annotations of varying granularity. Through controlled experiments, we find that incorporating finer-grained causal annotations not only leads to higher degrees of causal awareness but also yields stronger out-of-distribution robustness. Finally, we extend our method to a sim-to-real causal transfer framework by means of cross-domain multi-task learning, which boosts generalization in practical settings even without real-world annotations. We hope our work provides more clarity to the challenges and opportunities of learning causally-aware representations in the multi-agent context while making a first step towards a practical solution. |
Yuejiang Liu · Ahmad Rahimi · Po-Chien Luan · Frano Rajič · Alexandre Alahi 🔗 |
-
|
A Causal Ordering Prior for Unsupervised Representation Learning
(
Poster
)
>
link
Unsupervised representation learning with variational inference relies heavily on independence assumptions over latent variables. Causal representation learning (CRL), however, argues that factors of variation in a dataset are, in fact, causally related. Allowing latent variables to be correlated, as a consequence of causal relationships, is more realistic and generalisable. So far, provably identifiable methods rely on: auxiliary information, weak labels, and interventional or even counterfactual data. Inspired by causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM). We encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution. |
Avinash Kori · Pedro Sanchez · Konstantinos Vilouras · Ben Glocker · Sotirios Tsaftaris 🔗 |
-
|
Learning Macro Variables with Auto-encoders
(
Poster
)
>
link
Most causal variables that we reason over, in both science and everyday life, are coarse abstractions of low-level data. However, despite their importance, the field of causality lacks a precise theory of abstract "macro" variables and their relation to low-level "micro" variables that can account for our intuitions. Here, we define a macro variable as something that (a) is simpler than its micro variable, (b) shares mutual information with its micro variable, and (c) is related to other macro variables via simple mechanisms. From this definition, we propose DeepCFL: a simple self-supervised method that learns macro variables and their relations. We empirically validate DeepCFL on synthetic tasks where the underlying macro variables are known, and find that they can be recovered with high fidelity. Given that the individual components of DeepCFL leverage standard and scalable techniques in deep learning, our preliminary results are encouraging signs that it can be successfully applied to real-world data. |
Dhanya Sridhar · Eric Elmoznino · Maitreyi Swaroop 🔗 |