Portrait de Siamak Ravanbakhsh

Siamak Ravanbakhsh

Membre académique principal
Chaire en IA Canada-CIFAR
Professeur adjoint, McGill University, École d'informatique

Biographie

Siamak Ravanbakhsh est professeur adjoint à l’École d’informatique de l’Université McGill depuis août 2019. Avant de se joindre à McGill et à Mila – Institut québécois d’intelligence artificielle, il a occupé un poste similaire à l’Université de la Colombie-Britannique. De 2015 à 2017, il a été stagiaire postdoctoral au Département d’apprentissage automatique et à l’Institut de robotique de l’Université Carnegie Mellon, et il a obtenu un doctorat de l’Université de l’Alberta. Il s’intéresse aux problèmes de l’apprentissage de la représentation et de l’inférence dans l’IA.

Ses recherches actuelles portent sur le rôle de la symétrie et de l’invariance dans l’apprentissage profond des représentations.

Étudiants actuels

Doctorat - McGill University
Co-superviseur⋅e :
Maîtrise professionnelle - McGill University
Stagiaire de recherche - McGill University
Stagiaire de recherche - McGill University
Visiteur de recherche indépendant
Maîtrise professionnelle - McGill University
Maîtrise recherche - McGill University
Postdoctorat - McGill University
Doctorat - McGill University
Co-superviseur⋅e :
Maîtrise recherche - McGill University
Doctorat - McGill University
Maîtrise recherche - McGill University

Publications

Scalable Hierarchical Self-Attention with Learnable Hierarchy for Long-Range Interactions
Thuan Nguyen Anh Trang
Khang Nhat Ngo
Hugo Sonnery
Thieu Vo
Truong Son Hy
Self-attention models have made great strides toward accurately modeling a wide array of data modalities, including, more recently, graph-st… (voir plus)ructured data. This paper demonstrates that adaptive hierarchical attention can go a long way toward successfully applying transformers to graphs. Our proposed model Sequoia provides a powerful inductive bias towards long-range interaction modeling, leading to better generalization. We propose an end-to-end mechanism for a data-dependent construction of a hierarchy which in turn guides the self-attention mechanism. Using adaptive hierarchy provides a natural pathway toward sparse attention by constraining node-to-node interactions with the immediate family of each node in the hierarchy (e.g., parent, children, and siblings). This in turn dramatically reduces the computational complexity of a self-attention layer from quadratic to log-linear in terms of the input size while maintaining or sometimes even surpassing the standard transformer's ability to model long-range dependencies across the entire input. Experimentally, we report state-of-the-art performance on long-range graph benchmarks while remaining computationally efficient. Moving beyond graphs, we also display competitive performance on long-range sequence modeling, point-clouds classification, and segmentation when using a fixed hierarchy. Our source code is publicly available at https://github.com/HySonLab/HierAttention
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Tara Akhound-Sadegh
Jarrid Rector-Brooks
Joey Bose
Sarthak Mittal
Pablo Lemos
Cheng-Hao Liu
Marcin Sendera
Nikolay Malkin
Alexander Tong
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (voir plus)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant
E(3)-Equivariant Mesh Neural Networks
Thuan N.a. Trang
Nhat-Khang Ngô
Daniel Levy
Thieu N. Vo
Truong Son Hy
Triangular meshes are widely used to represent three-dimensional objects. As a result, many recent works have address the need for geometric… (voir plus) deep learning on 3D mesh. However, we observe that the complexities in many of these architectures does not translate to practical performance, and simple deep models for geometric graphs are competitive in practice. Motivated by this observation, we minimally extend the update equations of E(n)-Equivariant Graph Neural Networks (EGNNs) (Satorras et al., 2021) to incorporate mesh face information, and further improve it to account for long-range interactions through hierarchy. The resulting architecture, Equivariant Mesh Neural Network (EMNN), outperforms other, more complicated equivariant methods on mesh tasks, with a fast run-time and no expensive pre-processing. Our implementation is available at https://github.com/HySonLab/EquiMesh
On Diffusion Modeling for Anomaly Detection
Victor Livernoche
Vineet Jain
Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detectio… (voir plus)n. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.
Efficient Dynamics Modeling in Interactive Environments with Koopman Theory
Arnab Kumar Mondal
Siba Smarak Panigrahi
Sai Rajeswar
The accurate modeling of dynamics in interactive environments is critical for successful long-range prediction. Such a capability could adva… (voir plus)nce Reinforcement Learning (RL) and Planning algorithms, but achieving it is challenging. Inaccuracies in model estimates can compound, resulting in increased errors over long horizons. We approach this problem from the lens of Koopman theory, where the nonlinear dynamics of the environment can be linearized in a high-dimensional latent space. This allows us to efficiently parallelize the sequential problem of long-range prediction using convolution while accounting for the agent’s action at every time step. Our approach also enables stability analysis and better control over gradients through time. Taken together, these advantages result in significant improvement over the existing approaches, both in the efficiency and the accuracy of modeling dynamics over extended horizons. We also show that this model can be easily incorporated into dynamics modeling for model-based planning and model-free RL and report promising experimental results.
Symmetry Breaking and Equivariant Neural Networks
Sékou-Oumar Kaba
Using symmetry as an inductive bias in deep learning has been proven to be a principled approach for sample-efficient model design. However,… (voir plus) the relationship between symmetry and the imperative for equivariance in neural networks is not always obvious. Here, we analyze a key limitation that arises in equivariant functions: their incapacity to break symmetry at the level of individual data samples. In response, we introduce a novel notion of 'relaxed equivariance' that circumvents this limitation. We further demonstrate how to incorporate this relaxation into equivariant multilayer perceptrons (E-MLPs), offering an alternative to the noise-injection method. The relevance of symmetry breaking is then discussed in various application domains: physics, graph representation learning, combinatorial optimization and equivariant decoding.
Weight-Sharing Regularization
Mehran Shakerinava
Motahareh Sohrabi
Physics-Informed Transformer Networks
Fabricio Dos Santos
F. Dos
Tara Akhound-Sadegh
Physics-informed neural networks (PINNs) have been recognized as a viable alternative to conventional numerical solvers for Partial Differen… (voir plus)tial Equations (PDEs). The main appeal of PINNs is that since they directly enforce the PDE equation, one does not require access to costly ground truth solutions for training the model. However, a key challenge is their limited generalization across varied initial conditions. Addressing this, our study presents a novel Physics-Informed Transformer (PIT) model for learning the solution operator for PDEs. Using the attention mechanism, PIT learns to leverage the relationships between its initial condition and query points, resulting in a significant improvement in generalization. Moreover, in contrast to existing physics-informed networks, our model is invariant to the discretization of the input domain, providing great flexibility in problem specification and training. We validated our proposed method on the 1D Burgers’ and the 2D Heat equations, demonstrating notable improvement over standard PINN models for operator learning with negligible computational overhead.
Learning to Reach Goals via Diffusion
Vineet Jain
We present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of denoising diffusion models. An… (voir plus)alogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy to reverse these deviations, analogously to the score function. This approach, which we call Merlin, can reach specified goals from an arbitrary initial state without learning a separate value function. In contrast to recent works utilizing diffusion models in offline RL, Merlin stands out as the first method to perform diffusion in the state space, requiring only one ``denoising"iteration per environment step. We experimentally validate our approach in various offline goal-reaching tasks, demonstrating substantial performance enhancements compared to state-of-the-art methods while improving computational efficiency over other diffusion-based RL methods by an order of magnitude. Our results suggest that this perspective on diffusion for RL is a simple, scalable, and practical direction for sequential decision making.
Equivariant Adaptation of Large Pretrained Models
Arnab Kumar Mondal
Siba Smarak Panigrahi
Sékou-Oumar Kaba
Sai Rajeswar
Equivariant networks are specifically designed to ensure consistent behavior with respect to a set of input transformations, leading to high… (voir plus)er sample efficiency and more accurate and robust predictions. However, redesigning each component of prevalent deep neural network architectures to achieve chosen equivariance is a difficult problem and can result in a computationally expensive network during both training and inference. A recently proposed alternative towards equivariance that removes the architectural constraints is to use a simple canonicalization network that transforms the input to a canonical form before feeding it to an unconstrained prediction network. We show here that this approach can effectively be used to make a large pretrained network equivariant. However, we observe that the produced canonical orientations can be misaligned with those of the training distribution, hindering performance. Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance. This significantly improves the robustness of these models to deterministic transformations of the data, such as rotations. We believe this equivariant adaptation of large pretrained models can help their domain-specific applications with known symmetry priors.
Lie Point Symmetry and Physics-Informed Networks
Tara Akhound-Sadegh
Johannes Brandstetter
Max Welling
Symmetries have been leveraged to improve the generalization of neural networks through different mechanisms from data augmentation to equiv… (voir plus)ariant architectures. However, despite their potential, their integration into neural solvers for partial differential equations (PDEs) remains largely unexplored. We explore the integration of PDE symmetries, known as Lie point symmetries, in a major family of neural solvers known as physics-informed neural networks (PINNs). We propose a loss function that informs the network about Lie point symmetries in the same way that PINN models try to enforce the underlying PDE through a loss function. Intuitively, our symmetry loss ensures that the infinitesimal generators of the Lie group conserve the PDE solutions.. Effectively, this means that once the network learns a solution, it also learns the neighbouring solutions generated by Lie point symmetries. Empirical evaluations indicate that the inductive bias introduced by the Lie point symmetries of the PDEs greatly boosts the sample efficiency of PINNs.
Using Multiple Vector Channels Improves E(n)-Equivariant Graph Neural Networks
Daniel Levy
Sékou-Oumar Kaba
Carmelo Gonzales
Santiago Miret