Siamak Ravanbakhsh

Laurence Perreault-Levasseur

Biography

Siamak Ravanbakhsh is an assistant professor at McGill University’s School of Computer Science and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Before joining McGill and Mila, he held a similar position at the University of British Columbia. Prior to that, he was a postdoctoral fellow at the Machine Learning Department and Robotics Institute of Carnegie Mellon University. He completed his PhD at the University of Alberta.

Ravanbakhsh’s research is centred around problems of representation learning, in particular the principled use of geometry, probabilistic inference and symmetry.

Current Students

Tara Akhound-Sadegh

PhD - McGill University

Co-supervisor :

Han Du Han

Professional Master's - McGill University

Noah El Rimawi-Fine

Master's Research - McGill University

Co-supervisor :

Mathieu Blanchette

Vineet Jain

PhD - McGill University

Oumar Kaba

PhD - McGill University

PhD - McGill University

Co-supervisor :

Daniel Levy

PhD - McGill University

Xiusi Li

PhD - McGill University

Research Intern - McGill University

Mahsa Massoud

Master's Research - McGill University

Khang Ngo

Master's Research - McGill University

Mohammad Pedramfar

Postdoctorate - McGill University

Kusha Sareen

Master's Research - McGill University

Mehran Shakerinava

PhD - McGill University

Cléo Sonnery

Collaborating Alumni - McGill University

Alan Yang

Professional Master's - McGill University

Publications

Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities

Tara Akhound-Sadegh

Jungyoon Lee

Joey Bose

Valentin De Bortoli

Arnaud Doucet

Michael M. Bronstein

Dominique Beaini

Kirill Neklyudov

Alexander Tong

Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact sc… (see more)ientific applications. A promising approach towards this challenge is the design of amortized samplers that borrow key ideas, such as probability path design, from state-of-the-art generative diffusion models. However, all existing diffusion-based samplers remain unable to draw samples from distributions at the scale of even simple molecular systems. In this paper, we propose Progressive Inference-Time Annealing (PITA), a novel framework to learn diffusion-based samplers that combines two complementary interpolation techniques: I.) Annealing of the Boltzmann distribution and II.) Diffusion smoothing. PITA trains a sequence of diffusion models from high to low temperatures by sequentially training each model at progressively higher temperatures, leveraging engineered easy access to samples of the temperature-annealed target density. In the subsequent step, PITA enables simulating the trained diffusion model to procure training samples at a lower temperature for the next diffusion model through inference-time annealing using a novel Feynman-Kac PDE combined with Sequential Monte Carlo. Empirically, PITA enables, for the first time, equilibrium sampling of N-body particle systems, Alanine Dipeptide, and tripeptides in Cartesian coordinates with dramatically lower energy function evaluations. Code available at: https://github.com/taraak/pita

2025-06-19

ArXiv (preprint)

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava

Adam M. Oberman

2025-05-17

ArXiv (preprint)

Beyond Scalar Rewards: An Axiomatic Framework for Lexicographic MDPs

Mehran Shakerinava

Adam M. Oberman

2025-05-01

arXiv (published)

SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models

Daniel Levy

Siba Smarak Panigrahi

Sékou-Oumar Kaba

Qiang Zhu

Kin Long Kelvin Lee

Mikhail Galkin

Santiago Miret

2025-01-22

ICLR.cc/2025/Conference (poster)

On the Identifiability of Causal Abstractions

Xiusi Li

Sékou-Oumar Kaba

Causal representation learning methods seek to enhance machine learning models' robustness and generalization capabilities by learning laten… (see more)t representations and causal graphs aligned with the data generating process. In many systems, fully recovering the true causal structure is challenging because we cannot intervene on all latent variables individually. We introduce a theoretical framework that calculates the degree to which we can identify a causal structure in the more realistic setting of interventions on arbitrary subsets of latent variables. We find that in that case, we can only identify a causal model up to a \emph{causal abstraction}. These causal abstractions are still meaningful in that they describe the system at a higher level of granularity. Conversely, given a causal abstraction, our framework provides sufficient conditions for its identifiability. Our findings extend existing identifiability results in two areas: those that address abstractions of latent variables without considering graphical structures and those that focus on graphical structures without incorporating their abstractions.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

On the Identifiability of Causal Abstractions

Xiusi Li

Sékou-Oumar Kaba

Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models … (see more)associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.

2025-01-22

aistats.org/AISTATS/2025/Conference (poster)

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen

Daniel Levy

Arnab Kumar Mondal

Sékou-Oumar Kaba

Tara Akhound-Sadegh

Generative modeling of symmetric densities has a range of applications in AI for science, from drug discovery to physics simulations. The ex… (see more)isting generative modeling paradigm for invariant densities combines an invariant prior with an equivariant generative process. However, we observe that this technique is not necessary and has several drawbacks resulting from the limitations of equivariant networks. Instead, we propose to model a learned slice of the density so that only one representative element per orbit is learned. To accomplish this, we learn a group-equivariant canonicalization network that maps training samples to a canonical pose and train a non-equivariant generative model over these canonicalized samples. We implement this idea in the context of diffusion models. Our preliminary experimental results on molecular modeling are promising, demonstrating improved sample quality and faster inference time.

2025-01-14

ArXiv (preprint)

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen

Daniel Levy

Arnab Kumar Mondal

Sékou-Oumar Kaba

Tara Akhound-Sadegh

2024-10-23

NeurIPS.cc/2024/Workshop/NeurReps (poster)

Sampling from Energy-based Policies using Diffusion

Vineet Jain

Tara Akhound-Sadegh

2024-10-02

ArXiv (preprint)

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Tara Akhound-Sadegh

Jarrid Rector-Brooks

Joey Bose

Sarthak Mittal

Pablo Lemos

Cheng-Hao Liu

Marcin Sendera

Gauthier Gidel

Yoshua Bengio

Nikolay Malkin

Alexander Tong

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient---and no data samples---to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is *simulation-free*, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-05-01

ICML.cc/2024/Conference (poster)

Weight-Sharing Regularization

Mehran Shakerinava

Motahareh Sohrabi

Simon Lacoste-Julien

2024-04-18

Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (published)

Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions

Thuan Nguyen Anh Trang

Khang Nhat Ngo

Hugo Sonnery

Thieu Vo

Truong Son Hy

Self-attention models have made great strides toward accurately modeling a wide array of data modalities, including, more recently, graph-st… (see more)ructured data. This paper demonstrates that adaptive hierarchical attention can go a long way toward successfully applying transformers to graphs. Our proposed model Sequoia provides a powerful inductive bias towards long-range interaction modeling, leading to better generalization. We propose an end-to-end mechanism for a data-dependent construction of a hierarchy which in turn guides the self-attention mechanism. Using adaptive hierarchy provides a natural pathway toward sparse attention by constraining node-to-node interactions with the immediate family of each node in the hierarchy (e.g., parent, children, and siblings). This in turn dramatically reduces the computational complexity of a self-attention layer from quadratic to log-linear in terms of the input size while maintaining or sometimes even surpassing the standard transformer's ability to model long-range dependencies across the entire input. Experimentally, we report state-of-the-art performance on long-range graph benchmarks while remaining computationally efficient. Moving beyond graphs, we also display competitive performance on long-range sequence modeling, point-clouds classification, and segmentation when using a fixed hierarchy. Our source code is publicly available at https://github.com/HySonLab/HierAttention

2024-04-11

TMLR (accepted)