Portrait de Dhanya Sridhar

Dhanya Sridhar

Membre académique principal
Chaire en IA Canada-CIFAR
Professeure adjointe, Université de Montréal, Département d'informatique et de recherche opérationnelle
Sujets de recherche
Apprentissage de représentations
Apprentissage profond
Causalité
Modèles probabilistes
Raisonnement

Biographie

Dhanya Sridhar est professeure adjointe au Département d'informatique et de recherche opérationnelle (DIRO) de l'Université de Montréal, membre académique principale de Mila – Institut québécois d'intelligence artificielle et titulaire d'une chaire en IA Canada-CIFAR. Auparavant, elle a été chercheuse postdoctorale à l’Université Columbia. Elle a obtenu un doctorat de l’Université de la Californie à Santa Cruz. Ses recherches portent sur la combinaison de la causalité et de l'apprentissage automatique au service de systèmes d'IA qui sont résistants aux changements de distribution, s'adaptent efficacement à de nouvelles tâches et découvrent de nouvelles connaissances en même temps que nous.

Étudiants actuels

Collaborateur·rice de recherche - Helmholtz AI
Visiteur de recherche indépendant - University of Maryland College Park
Stagiaire de recherche - McGill
Co-superviseur⋅e :
Maîtrise recherche - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :
Doctorat - UdeM
Superviseur⋅e principal⋅e :

Publications

Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Leo Gagnon
Eric Elmoznino
Sarthak Mittal
Tom Marty
Tejas Kasetty
The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is b… (voir plus)ecause, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.
Next-Token Prediction Should be Ambiguity-Sensitive: A Meta-Learning Perspective
Leo Gagnon
Eric Elmoznino
Sarthak Mittal
Tom Marty
Tejas Kasetty
The rapid adaptation ability of auto-regressive foundation models is often attributed to the diversity of their pre-training data. This is b… (voir plus)ecause, from a Bayesian standpoint, minimizing prediction error in such settings requires integrating over all plausible latent hypotheses consistent with observations. While this behavior is desirable in principle, it often proves too ambitious in practice: under high ambiguity, the number of plausible latent alternatives makes Bayes-optimal prediction computationally intractable. Cognitive science has long recognized this limitation, suggesting that under such conditions, heuristics or information-seeking strategies are preferable to exhaustive inference. Translating this insight to next-token prediction, we hypothesize that low- and high-ambiguity predictions pose different computational demands, making ambiguity-agnostic next-token prediction a detrimental inductive bias. To test this, we introduce MetaHMM, a synthetic sequence meta-learning benchmark with rich compositional structure and a tractable Bayesian oracle. We show that Transformers indeed struggle with high-ambiguity predictions across model sizes. Motivated by cognitive theories, we propose a method to convert pre-trained models into Monte Carlo predictors that decouple task inference from token prediction. Preliminary results show substantial gains in ambiguous contexts through improved capacity allocation and test-time scalable inference, though challenges remain.
Next-Token Prediction Should be Ambiguity-Sensitive : A Meta-Learing Perspective
Leo Gagnon
Eric Elmoznino
Sarthak Mittal
Tom Marty
Tejas Kasetty
A Meta-Learning Approach to Causal Inference
Dragos Cristian Manta
Philippe Brouillard
Predicting the effect of unseen interventions is at the heart of many scientific endeavours. While causal discovery is often used to answer … (voir plus)these causal questions, it involves learning a full causal model, not tailored to the specific goal of predicting unseen interventions, and operates under stringent assumptions. We introduce a novel method based on meta-learning that predicts interventional effects without explicitly assuming a causal model. Our preliminary results on synthetic data show that it can provide good generalization to unseen interventions, and it even compares favorably to a causal discovery method. Our model-agnostic method opens up many avenues for future exploration, particularly for settings where causal discovery cannot be applied.
Does learning the right latent variables necessarily improve in-context learning?
Sarthak Mittal
Eric Elmoznino
Leo Gagnon
Sangnie Bhardwaj
Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting ave… (voir plus)nues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.
In-context learning and Occam's razor
Eric Elmoznino
Tom Marty
Tejas Kasetty
Leo Gagnon
Sarthak Mittal
Mahan Fathi
A central goal of machine learning is generalization. While the No Free Lunch Theorem states that we cannot obtain theoretical guarantees fo… (voir plus)r generalization without further assumptions, in practice we observe that simple models which explain the training data generalize best: a principle called Occam's razor. Despite the need for simple models, most current approaches in machine learning only minimize the training error, and at best indirectly promote simplicity through regularization or architecture design. Here, we draw a connection between Occam's razor and in-context learning: an emergent ability of certain sequence models like Transformers to learn at inference time from past observations in a sequence. In particular, we show that the next-token prediction loss used to train in-context learners is directly equivalent to a data compression technique called prequential coding, and that minimizing this loss amounts to jointly minimizing both the training error and the complexity of the model that was implicitly learned from context. Our theory and the empirical experiments we use to support it not only provide a normative account of in-context learning, but also elucidate the shortcomings of current in-context learning methods, suggesting ways in which they can be improved. We make our code available at https://github.com/3rdCore/PrequentialCode.
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Philippe Brouillard
Chandler Squires
Jonas Wahl
Konrad Paul Kording
Karen Sachs
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientifi… (voir plus)c disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Philippe Brouillard
Chandler Squires
Jonas Wahl
Konrad P. Kording
Karen Sachs
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientifi… (voir plus)c disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Philippe Brouillard
Chandler Squires
Jonas Wahl
Konrad P. Kording
Karen Sachs
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientifi… (voir plus)c disciplines. However, its real-world applications remain limited. Current methods often rely on unrealistic assumptions and are evaluated only on simple synthetic toy datasets, often with inadequate evaluation metrics. In this paper, we substantiate these claims by performing a systematic review of the recent causal discovery literature. We present applications in biology, neuroscience, and Earth sciences - fields where causal discovery holds promise for addressing key challenges. We highlight available simulated and real-world datasets from these domains and discuss common assumption violations that have spurred the development of new methods. Our goal is to encourage the community to adopt better evaluation practices by utilizing realistic datasets and more adequate metrics.
General Causal Imputation via Synthetic Interventions
Marco Jiralerspong
Thomas Jiralerspong
Vedant Shah
Given two sets of elements (such as cell types and drug compounds), researchers typically only have access to a limited subset of their inte… (voir plus)ractions. The task of causal imputation involves using this subset to predict unobserved interactions. Squires et al. (2022) have proposed two estimators for this task based on the synthetic interventions (SI) estimator: SI-A (for actions) and SI-C (for contexts). We extend their work and introduce a novel causal imputation estimator, generalized synthetic interventions (GSI). We prove the identifiability of this estimator for data generated from a more complex latent factor model. On synthetic and real data we show empirically that it recovers or outperforms their estimators.
General Causal Imputation via Synthetic Interventions
Marco Jiralerspong
Thomas Jiralerspong
Vedant Shah
Given two sets of elements (such as cell types and drug compounds), researchers typically only have access to a limited subset of their inte… (voir plus)ractions. The task of causal imputation involves using this subset to predict unobserved interactions. Squires et al. (2022) have proposed two estimators for this task based on the synthetic interventions (SI) estimator: SI-A (for actions) and SI-C (for contexts). We extend their work and introduce a novel causal imputation estimator, generalized synthetic interventions (GSI). We prove the identifiability of this estimator for data generated from a more complex latent factor model. On synthetic and real data we show empirically that it recovers or outperforms their estimators.
General Causal Imputation via Synthetic Interventions
Marco Jiralerspong
Thomas Jiralerspong
Vedant Shah