Portrait of Majdi Hassan

Majdi Hassan

PhD - Université de Montréal
Supervisor
Co-supervisor
Research Topics
AI for Science
Computational Biology
Deep Learning
Density Functional Theory
Drug Discovery
Generative Models
Molecular Modeling

Publications

Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan
Leon Klein
Saifuddin Syed
Michael M. Bronstein
Alexander Tong
Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Cla… (see more)ssical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in-full for each system of interest. The widespread success of generative models has inspired interest into overcoming this limitation through learning sampling algorithms. Despite performing on par with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We prove that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 280 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve superior performance to established methods such as sequential Monte Carlo on unseen tetrapeptides. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.
Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan
Leon Klein
Saifuddin Syed
Michael M. Bronstein
Alexander Tong
Efficient equilibrium sampling of molecular conformations remains a core challenge in computational chemistry and statistical inference. Cla… (see more)ssical approaches such as molecular dynamics or Markov chain Monte Carlo inherently lack amortization; the computational cost of sampling must be paid in-full for each system of interest. The widespread success of generative models has inspired interest into overcoming this limitation through learning sampling algorithms. Despite performing on par with conventional methods when trained on a single system, learned samplers have so far demonstrated limited ability to transfer across systems. We prove that deep learning enables the design of scalable and transferable samplers by introducing Prose, a 280 million parameter all-atom transferable normalizing flow trained on a corpus of peptide molecular dynamics trajectories up to 8 residues in length. Prose draws zero-shot uncorrelated proposal samples for arbitrary peptide systems, achieving the previously intractable transferability across sequence length, whilst retaining the efficient likelihood evaluation of normalizing flows. Through extensive empirical evaluation we demonstrate the efficacy of Prose as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based finetuning procedure to achieve superior performance to established methods such as sequential Monte Carlo on unseen tetrapeptides. We open-source the Prose codebase, model weights, and training dataset, to further stimulate research into amortized sampling methods and finetuning objectives.
Amortized Sampling with Transferable Normalizing Flows
Charlie B. Tan
Leon Klein
Saifuddin Syed
Michael M. Bronstein
Alexander Tong
Self-Refining Training for Amortized Density Functional Theory
Cristian Gabellini
Hatem Helal
Density Functional Theory (DFT) allows for predicting all the chemical and physical properties of molecular systems from first principles by… (see more) finding an approximate solution to the many-body Schr\"odinger equation. However, the cost of these predictions becomes infeasible when increasing the scale of the energy evaluations, e.g., when calculating the ground-state energy for simulating molecular dynamics. Recent works have demonstrated that, for substantially large datasets of molecular conformations, Deep Learning-based models can predict the outputs of the classical DFT solvers by amortizing the corresponding optimization problems. In this paper, we propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy. Namely, we propose an efficient method that simultaneously trains a deep-learning model to predict the DFT outputs and samples molecular conformations that are used as training data for the model. We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy. To demonstrate the utility of the proposed scheme, we perform an extensive empirical study comparing it with the models trained on the pre-collected datasets. Finally, we open-source our implementation of the proposed algorithm, optimized with asynchronous training and sampling stages, which enables simultaneous sampling and training. Code is available at https://github.com/majhas/self-refining-dft.
Self-Refining Training for Amortized Density Functional Theory
Cristian Gabellini
Hatem Helal
Density Functional Theory (DFT) allows for predicting all the chemical and physical properties of molecular systems from first principles by… (see more) finding an approximate solution to the many-body Schrödinger equation. However, the cost of these predictions becomes infeasible when increasing the scale of the energy evaluations, e.g., when calculating the ground-state energy for simulating molecular dynamics. Recent works have demonstrated that, for substantially large datasets of molecular conformations, Deep Learning-based models can predict the outputs of the classical DFT solvers by amortizing the corresponding optimization problems. In this paper, we propose a novel method that reduces the dependency of amortized DFT solvers on large pre-collected datasets by introducing a self-refining training strategy. Namely, we propose an efficient method that simultaneously trains a deep-learning model to predict the DFT outputs and samples molecular conformations that are used as training data for the model. We derive our method as a minimization of the variational upper bound on the KL-divergence measuring the discrepancy between the generated samples and the target Boltzmann distribution defined by the ground state energy. To demonstrate the utility of the proposed scheme, we perform an extensive empirical study comparing it with the models trained on the pre-collected datasets. Finally, we open-source our implementation of the proposed algorithm, optimized with asynchronous training and sampling stages, which enables simultaneous sampling and training. Code is available at https://github.com/majhas/self-refining-dft.
ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation
Nikhil Shenoy
Hannes Stärk
Stephan Thaler
Predicting low-energy molecular conformations given a molecular graph is an important but challenging task in computational drug discovery.… (see more) Existing state- of-the-art approaches either resort to large scale transformer-based models that diffuse over conformer fields, or use computationally expensive methods to gen- erate initial structures and diffuse over torsion angles. In this work, we introduce Equivariant Transformer Flow (ET-Flow). We showcase that a well-designed flow matching approach with equivariance and harmonic prior alleviates the need for complex internal geometry calculations and large architectures, contrary to the prevailing methods in the field. Our approach results in a straightforward and scalable method that directly operates on all-atom coordinates with minimal assumptions. With the advantages of equivariance and flow matching, ET-Flow significantly increases the precision and physical validity of the generated con- formers, while being a lighter model and faster at inference. Code is available https://github.com/shenoynikhil/ETFlow.
Equivariant Flow Matching for Molecular Conformer Generation
Nikhil Shenoy
Hannes Stärk
Stephan Thaler