Sébastien Lachapelle

Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Jason Hartford

Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (voir plus)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.

2025-07-25

Transactions on Machine Learning Research (accepté)

doi.org

openreview.net

Causal Representation Learning in Temporal Data via Single-Parent Decoding

Yaniv Gurwicz

Peer Nowack

Jakob Runge

David Rolnick

Scientific research often seeks to understand the causal structure underlying high-level variables in a system. For example, climate scienti… (voir plus)sts study how phenomena, such as El Niño, affect other climate processes at remote locations across the globe. However, scientists typically collect low-level measurements, such as geographically distributed temperature readings. From these, one needs to learn both a mapping to causally-relevant latent variables, such as a high-level representation of the El Niño phenomenon and other processes, as well as the causal model over them. The challenge is that this task, called causal representation learning, is highly underdetermined from observational data alone, requiring other constraints during learning to resolve the indeterminacies. In this work, we consider a temporal model with a sparsity assumption, namely single-parent decoding: each observed low-level variable is only affected by a single latent variable. Such an assumption is reasonable in many scientific applications that require finding groups of low-level variables, such as extracting regions from geographically gridded measurement data in climate research or capturing brain regions from neural activity data. We demonstrate the identifiability of the resulting model and propose a differentiable method, Causal Discovery with Single-parent Decoding (CDSD), that simultaneously learns the underlying latents and a causal graph over them. We assess the validity of our theoretical results using simulated data and showcase the practical validity of our method in an application to real-world data from the climate science field.

2024-10-08

ArXiv (prépublication)

doi.org

openreview.net

Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse Actions, Interventions and Sparse Temporal Dependencies

Sébastien Lachapelle

Pau Rodríguez

Yash Sharma

Katie Everett

Rémi Le Priol

Alexandre Lacoste

Simon Lacoste-Julien

2024-01-09

ArXiv (prépublication)

doi.org

arxiv.org

Additive Decoders for Latent Variables Identification and Cartesian-Product Extrapolation

We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that b… (voir plus)oth are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.

2023-09-20

NeurIPS.cc/2023/Conference (présentation orale)

doi.org

openreview.net

Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning

Although disentangled representations are often said to be beneficial for downstream tasks, current empirical and theoretical understanding … (voir plus)is limited. In this work, we provide evidence that disentangled representations coupled with sparse base-predictors improve generalization. In the context of multi-task learning, we prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem. Finally, we explore a meta-learning version of this algorithm based on group Lasso multiclass SVM base-predictors, for which we derive a tractable dual formulation. It obtains competitive results on standard few-shot classification benchmarks, while each task is using only a fraction of the learned representations.

2022-12-31

ICML (publié)

doi.org

proceedings.mlr.press

Partial Disentanglement via Mechanism Sparsity

Sébastien Lachapelle

Simon Lacoste-Julien

2022-07-08

auai.org/UAI/2022/Workshop/CRL (présentation orale)

doi.org

openreview.net

Disentanglement via Mechanism Sparsity Regularization: A New Principle for Nonlinear ICA

Sébastien Lachapelle

Pau Rodríguez

Yash Sharma

Katie E Everett

Rémi Le Priol

Alexandre Lacoste

Simon Lacoste-Julien

This work introduces a novel principle we call disentanglement via mechanism sparsity regularization, which can be applied when the latent f… (voir plus)actors of interest depend sparsely on past latent factors and/or observed auxiliary variables. We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors and the sparse causal graphical model that relates them. We develop a rigorous identifiability theory, building on recent nonlinear independent component analysis (ICA) results, that formalizes this principle and shows how the latent variables can be recovered up to permutation if one regularizes the latent mechanisms to be sparse and if some graph connectivity criterion is satisfied by the data generating process. As a special case of our framework, we show how one can leverage unknown-target interventions on the latent factors to disentangle them, thereby drawing further connections between ICA and causality. We propose a VAE-based method in which the latent mechanisms are learned and regularized via binary masks, and validate our theory by showing it learns disentangled representations in simulations.

2021-12-31

CLeaR (publié)

proceedings.mlr.press

Predicting Tactical Solutions to Operational Planning Problems under Imperfect Information

Eric Larsen

Andrea Lodi

This paper offers a methodological contribution at the intersection of machine learning and operations research. Namely, we propose a method… (voir plus)ology to quickly predict tactical solutions to a given operational problem. In this context, the tactical solution is less detailed than the operational one but it has to be computed in very short time and under imperfect information. The problem is of importance in various applications where tactical and operational planning problems are interrelated and information about the operational problem is revealed over time. This is for instance the case in certain capacity planning and demand management systems. We formulate the problem as a two-stage optimal prediction stochastic program whose solution we predict with a supervised machine learning algorithm. The training data set consists of a large number of deterministic (second stage) problems generated by controlled probabilistic sampling. The labels are computed based on solutions to the deterministic problems (solved independently and offline) employing appropriate aggregation and subselection methods to address uncertainty. Results on our motivating application in load planning for rail transportation show that deep learning algorithms produce highly accurate predictions in very short computing time (milliseconds or less). The prediction accuracy is comparable to solutions computed by sample average approximation of the stochastic program.

2021-09-20

INFORMS Journal on Computing (inconnu)

doi.org

arxiv.org

Differentiable Causal Discovery from Interventional Data

Alexandre Lacoste

Learning a causal directed acyclic graph from data is a challenging task that involves solving a combinatorial problem for which the solutio… (voir plus)n is not always identifiable. A new line of work reformulates this problem as a continuous constrained optimization one, which is solved via the augmented Lagrangian method. However, most methods based on this idea do not make use of interventional data, which can significantly alleviate identifiability issues. This work constitutes a new step in this direction by proposing a theoretically-grounded method based on neural networks that can leverage interventional data. We illustrate the flexibility of the continuous-constrained framework by taking advantage of expressive neural architectures such as normalizing flows. We show that our approach compares favorably to the state of the art in a variety of settings, including perfect and imperfect interventions for which the targeted nodes may even be unknown.

2019-12-31

NeurIPS (publié)

doi.org

arxiv.org

A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms

Yoshua Bengio

Tristan Deleu

Nasim Rahaman

Nan Rosemary Ke

Sébastien Lachapelle

Olexa Bilaniuk

Anirudh Goyal

Christopher Pal

We propose to meta-learn causal structures based on how fast a learner adapts to new distributions arising from sparse distributional change… (voir plus)s, e.g. due to interventions, actions of agents and other sources of non-stationarities. We show that under this assumption, the correct causal structural choices lead to faster adaptation to modified distributions because the changes are concentrated in one or just a few mechanisms when the learned knowledge is modularized appropriately. This leads to sparse expected gradients and a lower effective number of degrees of freedom needing to be relearned while adapting to the change. It motivates using the speed of adaptation to a modified distribution as a meta-learning objective. We demonstrate how this can be used to determine the cause-effect relationship between two observed variables. The distributional changes do not need to correspond to standard interventions (clamping a variable), and the learner has no direct knowledge of these interventions. We show that causal structures can be parameterized via continuous variables and learned end-to-end. We then explore how these ideas could be used to also learn an encoder that would map low-level observed variables to unobserved causal variables leading to faster adaptation out-of-distribution, learning a representation space where one can satisfy the assumptions of independent mechanisms and of small and sparse changes in these mechanisms due to actions and non-stationarities.

2019-12-31

ICLR (publié)

doi.org

openreview.net

G RADIENT -B ASED N EURAL DAG L EARNING WITH I NTERVENTIONS

Alexandre Lacoste

Decision making based on statistical association alone can be a dangerous endeavor due to non-causal associations. Ideally, one would rely o… (voir plus)n causal relationships that enable reasoning about the effect of interventions. Several methods have been proposed to discover such relationships from observational and inter-ventional data. Among them, GraN-DAG, a method that relies on the constrained optimization of neural networks, was shown to produce state-of-the-art results among algorithms relying purely on observational data. However, it is limited to observational data and cannot make use of interventions. In this work, we extend GraN-DAG to support interventional data and show that this improves its ability to infer causal structures

2019-12-31

(publié)

www.semanticscholar.org

Unsupervised one-to-many image translation

Samuel Lavoie-Marchildon

R Devon Hjelm

2018-09-26

(publié)