Portrait of Siamak Ravanbakhsh

Siamak Ravanbakhsh

Core Academic Member
Canada CIFAR AI Chair
Associate Professor, McGill University, School of Computer Science
Research Topics
Active Learning
AI Alignment
AI for Science
Bayesian Inference
Deep Learning
Generalization
Generative Models
Learning on Graphs
Probabilistic Models
Reasoning
Reinforcement Learning
Representation Learning
Symmetry

Biography

Siamak Ravanbakhsh is an assistant professor at McGill University’s School of Computer Science and a core academic member of Mila – Quebec Artificial Intelligence Institute.

Before joining McGill and Mila, he held a similar position at the University of British Columbia. Prior to that, he was a postdoctoral fellow at the Machine Learning Department and Robotics Institute of Carnegie Mellon University. He completed his PhD at the University of Alberta.

Ravanbakhsh’s research is centred around problems of representation learning, in particular the principled use of geometry, probabilistic inference and symmetry.

Current Students

PhD - McGill University
Master's Research - McGill University
Principal supervisor :
Master's Research - McGill University
Professional Master's - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
PhD - McGill University
Master's Research - McGill University
Research Intern - McGill University
PhD - McGill University
Collaborating researcher - McGill University
Postdoctorate - McGill University
Master's Research - McGill University
Collaborating Alumni - McGill University

Publications

On the Identifiability of Causal Abstractions
Sékou-Oumar Kaba
Causal representation learning (CRL) enhances machine learning models' robustness and generalizability by learning structural causal models … (see more)associated with data-generating processes. We focus on a family of CRL methods that uses contrastive data pairs in the observable space, generated before and after a random, unknown intervention, to identify the latent causal model. (Brehmer et al., 2022) showed that this is indeed possible, given that all latent variables can be intervened on individually. However, this is a highly restrictive assumption in many systems. In this work, we instead assume interventions on arbitrary subsets of latent variables, which is more realistic. We introduce a theoretical framework that calculates the degree to which we can identify a causal model, given a set of possible interventions, up to an abstraction that describes the system at a higher level of granularity.
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Sékou-Oumar Kaba
Qiang Zhu
Kin Long Kelvin Lee
Santiago Miret
Generating novel crystalline materials has the potential to lead to advancements in fields such as electronics, energy storage, and catalysi… (see more)s. The defining characteristic of crystals is their symmetry, which plays a central role in determining their physical properties. However, existing crystal generation methods either fail to generate materials that display the symmetries of real-world crystals, or simply replicate the symmetry information from examples in a database. To address this limitation, we propose SymmCD, a novel diffusion-based generative model that explicitly incorporates crystallographic symmetry into the generative process. We decompose crystals into two components and learn their joint distribution through diffusion: 1) the asymmetric unit, the smallest subset of the crystal which can generate the whole crystal through symmetry transformations, and; 2) the symmetry transformations needed to be applied to each atom in the asymmetric unit. We also use a novel and interpretable representation for these transformations, enabling generalization across different crystallographic symmetry groups. We showcase the competitive performance of SymmCD on a subset of the Materials Project, obtaining diverse and valid crystals with realistic symmetries and predicted properties.
Symmetry-Aware Generative Modeling through Learned Canonicalization
Arnab Kumar Mondal
Sékou-Oumar Kaba
Generative modeling of symmetric densities has a range of applications in AI for science, from drug discovery to physics simulations. The ex… (see more)isting generative modeling paradigm for invariant densities combines an invariant prior with an equivariant generative process. However, we observe that this technique is not necessary and has several drawbacks resulting from the limitations of equivariant networks. Instead, we propose to model a learned slice of the density so that only one representative element per orbit is learned. To accomplish this, we learn a group-equivariant canonicalization network that maps training samples to a canonical pose and train a non-equivariant generative model over these canonicalized samples. We implement this idea in the context of diffusion models. Our preliminary experimental results on molecular modeling are promising, demonstrating improved sample quality and faster inference time.
Learning to Reach Goals via Diffusion
We present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of denoising diffusion models. An… (see more)alogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy to reverse these deviations, analogous to the score function. This approach, which we call Merlin, can reach specified goals from arbitrary initial states without learning a separate value function. In contrast to recent works utilizing diffusion models in offline RL, Merlin stands out as the first method to perform diffusion in the state space, requiring only one ``denoising" iteration per environment step. We experimentally validate our approach in various offline goal-reaching tasks, demonstrating substantial performance enhancements compared to state-of-the-art methods while improving computational efficiency over other diffusion-based RL methods by an order of magnitude. Our results suggest that this perspective on diffusion for RL is a simple and scalable approach for sequential decision making.
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant
Weight-Sharing Regularization
Weight-sharing is ubiquitous in deep learning. Motivated by this, we propose a "weight-sharing regularization" penalty on the weights …
Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions
Thuan Nguyen Anh Trang
Khang Nhat Ngo
Hugo Sonnery
Thieu Vo
Truong Son Hy
Self-attention models have made great strides toward accurately modeling a wide array of data modalities, including, more recently, graph-st… (see more)ructured data. This paper demonstrates that adaptive hierarchical attention can go a long way toward successfully applying transformers to graphs. Our proposed model Sequoia provides a powerful inductive bias towards long-range interaction modeling, leading to better generalization. We propose an end-to-end mechanism for a data-dependent construction of a hierarchy which in turn guides the self-attention mechanism. Using adaptive hierarchy provides a natural pathway toward sparse attention by constraining node-to-node interactions with the immediate family of each node in the hierarchy (e.g., parent, children, and siblings). This in turn dramatically reduces the computational complexity of a self-attention layer from quadratic to log-linear in terms of the input size while maintaining or sometimes even surpassing the standard transformer's ability to model long-range dependencies across the entire input. Experimentally, we report state-of-the-art performance on long-range graph benchmarks while remaining computationally efficient. Moving beyond graphs, we also display competitive performance on long-range sequence modeling, point-clouds classification, and segmentation when using a fixed hierarchy. Our source code is publicly available at https://github.com/HySonLab/HierAttention
On Diffusion Modeling for Anomaly Detection
Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detectio… (see more)n. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.
Efficient Dynamics Modeling in Interactive Environments with Koopman Theory
Arnab Kumar Mondal
Sai Rajeswar
The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could adva… (see more)nce Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent's action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.
E(3)-Equivariant Mesh Neural Networks
Thuan Trang
Nhat Khang Ngo
Thieu N. Vo
Truong Son Hy
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric… (see more) deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information, and further improve it to account for long-range interactions through hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive pre-processing. Our implementation is available at https://github.com/HySonLab/EquiMesh
Symmetry Breaking and Equivariant Neural Networks
Sékou-Oumar Kaba
Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sample-efficient model design. However,… (see more) the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (E-MLPs), offering an alternative to the noise-injection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding.
Physics-Informed Transformer Networks
Physics-informed neural networks (PINNs) have been recognized as a viable alternative to conventional numerical solvers for Partial Differen… (see more)tial Equations (PDEs). The main appeal of PINNs is that since they directly enforce the PDE equation, one does not require access to costly ground truth solutions for training the model. However, a key challenge is their limited generalization across varied initial conditions. Addressing this, our study presents a novel Physics-Informed Transformer (PIT) model for learning the solution operator for PDEs. Using the attention mechanism, PIT learns to leverage the relationships between its initial condition and query points, resulting in a significant improvement in generalization. Moreover, in contrast to existing physics-informed networks, our model is invariant to the discretization of the input domain, providing great flexibility in problem specification and training. We validated our proposed method on the 1D Burgers’ and the 2D Heat equations, demonstrating notable improvement over standard PINN models for operator learning with negligible computational overhead.