Portrait of Siamak Ravanbakhsh

Siamak Ravanbakhsh

Core Academic Member
Canada CIFAR AI Chair
Assistant Professor, McGill University, School of Computer Science
Research Topics
Causality
Deep Learning
Dynamical Systems
Generative Models
Graph Neural Networks
Information Theory
Learning on Graphs
Machine Learning Theory
Molecular Modeling
Probabilistic Models
Reasoning
Reinforcement Learning
Representation Learning

Biography

Siamak Ravanbakhsh is an assistant professor at McGill University’s School of Computer Science and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Before joining McGill and Mila, he held a similar position at the University of British Columbia. Prior to that, he was a postdoctoral fellow at the Machine Learning Department and Robotics Institute of Carnegie Mellon University. He completed his PhD at the University of Alberta.

Ravanbakhsh’s research is centred around problems of representation learning, in particular the principled use of geometry, probabilistic inference and symmetry.

Current Students

PhD - McGill University
Professional Master's - McGill University
Master's Research - McGill University
Principal supervisor :
Master's Research - McGill University
PhD - McGill University
Professional Master's - McGill University
PhD - McGill University
PhD - McGill University
Co-supervisor :
PhD - McGill University
Master's Research - McGill University
Master's Research - McGill University
Professional Master's - McGill University
Master's Research - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
PhD - McGill University
Collaborating Alumni - McGill University

Publications

Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
Behnoush Khavari
Jayesh Khullar
Franccois Rivest
Recent work has shown that LRNN models such as S4D, Mamba, and DeltaNet lack state-tracking capability due to either time-invariant transiti… (see more)on matrices or restricted eigenvalue ranges. To address this, input-dependent transition matrices, particularly those that are complex or non-triangular, have been proposed to enhance SSM performance on such tasks. While existing theorems demonstrate that both input-independent and non-negative SSMs are incapable of solving simple state-tracking tasks, such as parity, regardless of depth, they do not explore whether combining these two types in a multilayer SSM could help. We investigate this question for efficient SSMs with diagonal transition matrices and show that such combinations still fail to solve parity. This implies that a recurrence layer must both be input-dependent and include negative eigenvalues. Our experiments support this conclusion by analyzing an SSM model that combines S4D and Mamba layers.
Multi-Armed Sampling Problem and the End of Exploration
This paper introduces the framework of multi-armed sampling, as the sampling counterpart to the optimization problem of multi-arm bandits. O… (see more)ur primary motivation is to rigorously examine the exploration-exploitation trade-off in the context of sampling. We systematically define plausible notions of regret for this framework and establish corresponding lower bounds. We then propose a simple algorithm that achieves these optimal regret bounds. Our theoretical results demonstrate that in contrast to optimization, sampling does not require exploration. To further connect our findings with those of multi-armed bandits, we define a continuous family of problems and associated regret measures that smoothly interpolates and unifies multi-armed sampling and multi-armed bandit problems using a temperature parameter. We believe the multi-armed sampling framework, and our findings in this setting can have a foundational role in the study of sampling including recent neural samplers, akin to the role of multi-armed bandits in reinforcement learning. In particular, our work sheds light on the need for exploration and the convergence properties of algorithm for entropy-regularized reinforcement learning, fine-tuning of pretrained models and reinforcement learning with human feedback (RLHF).
Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities
Valentin De Bortoli
Arnaud Doucet
Michael M. Bronstein
Alexander Tong
Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact sc… (see more)ientific applications. A promising approach towards this challenge is the design of amortized samplers that borrow key ideas, such as probability path design, from state-of-the-art generative diffusion models. However, all existing diffusion-based samplers remain unable to draw samples from distributions at the scale of even simple molecular systems. In this paper, we propose Progressive Inference-Time Annealing (PITA), a novel framework to learn diffusion-based samplers that combines two complementary interpolation techniques: I.) Annealing of the Boltzmann distribution and II.) Diffusion smoothing. PITA trains a sequence of diffusion models from high to low temperatures by sequentially training each model at progressively higher temperatures, leveraging engineered easy access to samples of the temperature-annealed target density. In the subsequent step, PITA enables simulating the trained diffusion model to procure training samples at a lower temperature for the next diffusion model through inference-time annealing using a novel Feynman-Kac PDE combined with Sequential Monte Carlo. Empirically, PITA enables, for the first time, equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates with dramatically lower energy function evaluations. Code available at: https://github.com/taraak/pita
Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities
Valentin De Bortoli
Arnaud Doucet
Michael M. Bronstein
Alexander Tong
Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact sc… (see more)ientific applications. A promising approach towards this challenge is the design of amortized samplers that borrow key ideas, such as probability path design, from state-of-the-art generative diffusion models. However, all existing diffusion-based samplers remain unable to draw samples from distributions at the scale of even simple molecular systems. In this paper, we propose Progressive Inference-Time Annealing (PITA), a novel framework to learn diffusion-based samplers that combines two complementary interpolation techniques: I.) Annealing of the Boltzmann distribution and II.) Diffusion smoothing. PITA trains a sequence of diffusion models from high to low temperatures by sequentially training each model at progressively higher temperatures, leveraging engineered easy access to samples of the temperature-annealed target density. In the subsequent step, PITA enables simulating the trained diffusion model to procure training samples at a lower temperature for the next diffusion model through inference-time annealing using a novel Feynman-Kac PDE combined with Sequential Monte Carlo. Empirically, PITA enables, for the first time, equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates with dramatically lower energy function evaluations. Code available at: https://github.com/taraak/pita
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (see more)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, leading to inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that _samples_ from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant Diffusion Tree Search (DTS*) performs a robust search for high reward samples. On MNIST and CIFAR-10 class-conditional generation, DTS matches the FID of the best-performing baseline with up to
Diffusion Tree Sampling: Scalable inference-time alignment of diffusion models
Adapting a pretrained diffusion model to new objectives at inference time remains an open problem in generative modeling. Existing steering … (see more)methods suffer from inaccurate value estimation, especially at high noise levels, which biases guidance. Moreover, information from past runs is not reused to improve sample quality, resulting in inefficient use of compute. Inspired by the success of Monte Carlo Tree Search, we address these limitations by casting inference-time alignment as a search problem that reuses past computations. We introduce a tree-based approach that samples from the reward-aligned target density by propagating terminal rewards back through the diffusion chain and iteratively refining value estimates with each additional generation. Our proposed method, Diffusion Tree Sampling (DTS), produces asymptotically exact samples from the target distribution in the limit of infinite rollouts, and its greedy variant, Diffusion Tree Search (DTS
Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs
Behnoush Khavari
Jayesh Khullar
Franccois Rivest
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs
On the Identifiability of Causal Abstractions
Sékou-Oumar Kaba
Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models … (see more)associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Siba Smarak Panigrahi
Sékou-Oumar Kaba
Qiang Zhu
Kin Long Kelvin Lee
Mikhail Galkin
Santiago Miret
On the Identifiability of Causal Abstractions
Sékou-Oumar Kaba
Causal representation learning methods seek to enhance machine learning models' robustness and generalization capabilities by learning laten… (see more)t representations and causal graphs aligned with the data generating process. In many systems, fully recovering the true causal structure is challenging because we cannot intervene on all latent variables individually. We introduce a theoretical framework that calculates the degree to which we can identify a causal structure in the more realistic setting of interventions on arbitrary subsets of latent variables. We find that in that case, we can only identify a causal model up to a \emph{causal abstraction}. These causal abstractions are still meaningful in that they describe the system at a higher level of granularity. Conversely, given a causal abstraction, our framework provides sufficient conditions for its identifiability. Our findings extend existing identifiability results in two areas: those that address abstractions of latent variables without considering graphical structures and those that focus on graphical structures without incorporating their abstractions.