Portrait of Dhanya Sridhar

Dhanya Sridhar

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, Université de Montréal, Department of Computer Science and Operations Research


Dhanya Sridhar is an assistant professor in the Department of Computer Science and Operations Research (DIRO) at Université de Montréal, a core academic member of Mila – Quebec Artificial Intelligence Institute, and a Canada CIFAR AI Chair.

She was a postdoctoral researcher at Columbia University and received her doctorate from the University of California, Santa Cruz.

In brief, Sridhar’s research focuses on combining causality and machine learning in service of AI systems that are robust to distribution shifts, adapt to new tasks efficiently and discover new knowledge alongside us.

Current Students

Collaborating researcher - Helmholtz AI
PhD - Université de Montréal
Principal supervisor :
Research Intern - Université de Montréal
Co-supervisor :
PhD - Université de Montréal


Evaluating Interventional Reasoning Capabilities of Large Language Models
Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (see more)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. Our analysis on four LLMs highlights that while GPT- 4 models show promising accuracy at predicting the intervention effects, they remain sensitive to distracting factors in the prompts.
In-Context Learning Can Re-learn Forbidden Tasks
Sophie Xhonneux
David Dobre
Despite significant investment into safety training, large language models (LLMs) deployed in the real world still suffer from numerous vuln… (see more)erabilities. One perspective on LLM safety training is that it algorithmically forbids the model from answering toxic or harmful queries. To assess the effectiveness of safety training, in this work, we study forbidden tasks, i.e., tasks the model is designed to refuse to answer. Specifically, we investigate whether in-context learning (ICL) can be used to re-learn forbidden tasks despite the explicit fine-tuning of the model to refuse them. We first examine a toy example of refusing sentiment classification to demonstrate the problem. Then, we use ICL on a model fine-tuned to refuse to summarise made-up news articles. Finally, we investigate whether ICL can undo safety training, which could represent a major security risk. For the safety task, we look at Vicuna-7B, Starling-7B, and Llama2-7B. We show that the attack works out-of-the-box on Starling-7B and Vicuna-7B but fails on Llama2-7B. Finally, we propose an ICL attack that uses the chat template tokens like a prompt injection attack to achieve a better attack success rate on Vicuna-7B and Starling-7B. Trigger Warning: the appendix contains LLM-generated text with violence, suicide, and misinformation.
Learning Macro Variables with Auto-encoders
Eric Elmoznino
Maitreyi Swaroop
Adjusting Machine Learning Decisions for Equal Opportunity and Counterfactual Fairness
Yixin Wang
David Blei
Machine learning ( ml ) methods have the potential to automate high-stakes decisions, such as bail admissions or credit lending, by analyzin… (see more)g and learning from historical data. But these algorithmic decisions may be unfair: in learning from historical data, they may replicate discriminatory practices from the past. In this paper, we propose two algorithms that adjust fitted ML predictors to produce decisions that are fair. Our methods provide post-hoc adjustments to the predictors, without requiring that they be retrained. We consider a causal model of the ML decisions, define fairness through counterfactual decisions within the model, and then form algorithmic decisions that capture the historical data as well as possible, but are provably fair. In particular, we consider two definitions of fairness. The first is “equal counterfactual opportunity,” where the counterfactual distribution of the decision is the same regardless of the protected attribute; the second is counterfactual fairness. We evaluate the algorithms, and the trade-o � between accuracy and fairness, on datasets about admissions, income, credit, and recidivism.
Identifiable Deep Generative Models via Sparse Decoding
Gemma Elyse Moran
Yixin Wang
David Blei
We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors … (see more)(representations) which summarize the associations in the observed data features. The underlying model is sparse in that each observed feature (i.e. each dimension of the data) depends on a small subset of the latent factors. As examples, in ratings data each movie is only described by a few genres; in text data each word is only applicable to a few topics; in genomics, each gene is active in only a few biological processes. We prove such sparse deep generative models are identifiable: with infinite data, the true model parameters can be learned. (In contrast, most deep generative models are not identifiable.) We empirically study the sparse VAE with both simulated and real data. We find that it recovers meaningful latent factors and has smaller heldout reconstruction error than related methods.
Causal inference from text: A commentary
David Blei
Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations
Elliot Layne
Jason Hartford
Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (see more)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.
Estimating Social Influence from Observational Data
Caterina De Bacco
David Blei
We consider the problem of estimating social influence, the effect that a person's behavior has on the future behavior of their peers. The k… (see more)ey challenge is that shared behavior between friends could be equally explained by influence or by two other confounding factors: 1) latent traits that caused people to both become friends and engage in the behavior, and 2) latent preferences for the behavior. This paper addresses the challenges of estimating social influence with three contributions. First, we formalize social influence as a causal effect, one which requires inferences about hypothetical interventions. Second, we develop Poisson Influence Factorization (PIF), a method for estimating social influence from observational data. PIF fits probabilistic factor models to networks and behavior data to infer variables that serve as substitutes for the confounding latent traits. Third, we develop assumptions under which PIF recovers estimates of social influence. We empirically study PIF with semi-synthetic and real data from Last.fm, and conduct a sensitivity analysis. We find that PIF estimates social influence most accurately compared to related methods and remains robust under some violations of its assumptions.
Heterogeneous Supervised Topic Models
Hal Daumé III
David Blei