Dominique Beaini

Membre industriel associé

Professeur associé, Université de Montréal, Département d'informatique et de recherche opérationnelle

Chef de la recherche graphique, Valence Discovery

Sujets de recherche

Apprentissage multimodal

Apprentissage sur graphes

Modélisation moléculaire

Réseaux de neurones en graphes

Google Scholar

Biographie

Je suis actuellement chef d’équipe de l’unité de recherche de Valence Discovery, l’une des principales entreprises dans le domaine de l’apprentissage automatique appliqué à la découverte de médicaments, et professeur associé au Département d’informatique et de recherche opérationnelle (DIRO) de l’Université de Montréal. Mon objectif est d’amener l’apprentissage automatique vers une meilleure compréhension des molécules et de leurs interactions avec la biologie humaine. Je suis titulaire d’un doctorat de Polytechnique Montréal; mes recherches antérieures portaient sur la robotique et la vision par ordinateur.

Mes intérêts de recherche sont les réseaux neuronaux de graphes, l’apprentissage autosupervisé, la mécanique quantique, la découverte de médicaments, la vision par ordinateur et la robotique.

Étudiants actuels

Majdi Hassan

Doctorat - UdeM

Co-superviseur⋅e :

Maîtrise recherche - UdeM

Jungyoon Lee

Maîtrise recherche - UdeM

Publications

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Jama Hussein Mohamud

Michael Craig

Cristian Gabellini

Kerstin Klaser

Josef Dean

Cas Wognum … (voir 15 de plus)

Maciej Sypetkowski

Ioannis Koutis

Hadrien Mary

Therence Bois

Andrew William Fitzgibbon

Blazej Banaszewski

Chad Martin

Dominic Masters

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, wh… (voir plus)ere datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks. The Graphium library is publicly available on Github and the dataset links are available in Part 1 and Part 2.

2024-01-16

ICLR.cc/2024/Conference (poster)

doi.org

openreview.net

Latent Space Simulator for Unveiling Molecular Free Energy Landscapes and Predicting Transition Dynamics

Simon Dobers

Hannes Stärk

Xiang Fu

Dominique Beaini

Stephan Günnemann

Free Energy Surfaces (FES) and metastable transition rates are key elements in understanding the behavior of molecules within a system. Howe… (voir plus)ver, the typical approaches require computing force fields across billions of time steps in a molecular dynamics (MD) simulation, which is often considered intractable when dealing with large systems or databases. In this work, we propose LaMoDy, a latent-space MD simulator, to effectively tackle the intractability with around 20-fold speed improvements compared to classical MD. The model leverages a chirality-aware SE(3)-invariant encoder-decoder architecture to generate a latent space coupled with a recurrent neural network to run the time-wise dynamics. We show that LaMoDy effectively recovers realistic trajectories and FES more accurately and faster than existing methods while capturing their major dynamical and conformational properties. Furthermore, the proposed approach can generalize to molecules outside the training distribution.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Science (poster)

openreview.net

Role of Structural and Conformational Diversity for Machine Learning Potentials

Nikhil Shenoy

Prudencio Tossou

Emmanuel Noutahi

Hadrien Mary

Dominique Beaini

Jiarui Ding

In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically … (voir plus)conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Science (présentation orale)

doi.org

openreview.net

GPS++: Reviving the Art of Message Passing for Molecular Property Prediction

Dominic Masters

Josef Dean

Kerstin Klaser

Zhiyi Li

Samuel Maddrell-Mander

Adam Sanders

Hatem Helal

Deniz Beker

Andrew William Fitzgibbon

Shenyang Huang

Ladislav Rampasek

Dominique Beaini

2023-07-30

TMLR (accepté)

doi.org

openreview.net

Repurposing Density Functional Theory to Suit Deep Learning

Alexander Mathiasen

Hatem Helal

Paul Balanca

Kerstin Klaser

Josef Dean

Carlo Luschi

Dominique Beaini

Andrew William Fitzgibbon

Dominic Masters

Density Functional Theory (DFT) accurately predicts the properties of molecules given their atom types and positions, and often serves as gr… (voir plus)ound truth for molecular property prediction tasks. Neural Networks (NN) are popular tools for such tasks and are trained on DFT datasets, with the aim to approximate DFT at a fraction of the computational cost. Research in other areas of machine learning has shown that generalisation performance of NNs tends to improve with increased dataset size, however, the computational cost of DFT limits the size of DFT datasets. We present PySCFIPU, a DFT library that allows us to iterate on both dataset generation and NN training. We create QM10X, a dataset with 100M conformers, in 13 hours, on which we subsequently train SchNet in 12 hours. We show that the predictions of SchNet improve solely by increasing training data without incorporating further inductive biases.

2023-07-28

ICML.cc/2023/Workshop/SynS_and_ML (publié)

openreview.net

Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration

Xiangyu Zhao

Hannes Stärk

Dominique Beaini

Pietro Lio

Yiren Zhao

2023-03-06

ICLR.cc/2023/Workshop/MLDD (poster)

doi.org

openreview.net

Generating QM1B with PySCF$_{\text{IPU}}$

Alexander Mathiasen

Hatem Helal

Kerstin Klaser

Paul Balanca

Josef Dean

Carlo Luschi

Dominique Beaini

Andrew William Fitzgibbon

Dominic Masters

openreview.net

Generating QM1B with PySCFIPU

Alexander Mathiasen

Hatem Helal

Kerstin Klaser

Paul Balanca

Josef Dean

Carlo Luschi

Dominique Beaini

Andrew William Fitzgibbon

Dominic Masters

2023-01-01

NeurIPS (publié)

doi.org

arxiv.org

GPS++: An Optimised Hybrid MPNN/Transformer for Molecular Property Prediction

Dominic Masters

Josef Dean

Kerstin Klaser

Zhiyi Li

Samuel Maddrell-Mander

Adam Sanders

Hatem Helal

Deniz Beker

Ladislav Rampasek

Dominique Beaini

2022-11-18

ArXiv (prépublication)

doi.org

arxiv.org

3D Infomax improves GNNs for Molecular Property Prediction

Hannes Stärk

Dominique Beaini

Gabriele Corso

Prudencio Tossou

Christian Dallago

Stephan Günnemann

Pietro Lio

Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Including 3D mol… (voir plus)ecular structure as input to learned models improves their predictions for many molecular properties. However, this information is infeasible to compute at the scale required by most real-world applications. We propose pre-training a model to understand the geometry of molecules given only their 2D molecular graph. Using methods from self-supervised learning, we maximize the mutual information between a 3D summary vector and the representations of a Graph Neural Network (GNN) such that they contain latent 3D information. During fine-tuning on molecules with unknown geometry, the GNN still generates implicit 3D information and can use it to inform downstream tasks. We show that 3D pre-training provides significant improvements for a wide range of molecular properties, such as a 22% average MAE reduction on eight quantum mechanical properties. Crucially, the learned representations can be effectively transferred between datasets with vastly different molecules.

2022-06-28

Proceedings of the 39th International Conference on Machine Learning (publié)

proceedings.mlr.press

openreview.net

Long Range Graph Benchmark

Vijay Prakash Dwivedi

Anh Tuan Luu

Graph Neural Networks (GNNs) that are based on the message passing (MP) paradigm generally exchange information between 1-hop neighbors to b… (voir plus)uild node representations at each layer. In principle, such networks are not able to capture long-range interactions (LRI) that may be desired or necessary for learning a given task on graphs. Recently, there has been an increasing interest in development of Transformer-based methods for graphs that can consider full node connectivity beyond the original sparse structure, thus enabling the modeling of LRI. However, MP-GNNs that simply rely on 1-hop message passing often fare better in several existing graph benchmarks when combined with positional feature representations, among other innovations, hence limiting the perceived utility and ranking of Transformer-like architectures. Here, we present the Long Range Graph Benchmark (LRGB) with 5 graph learning datasets: PascalVOC-SP, COCO-SP, PCQM-Contact, Peptides-func and Peptides-struct that arguably require LRI reasoning to achieve strong performance in a given task. We benchmark both baseline GNNs and Graph Transformer networks to verify that the models which capture long-range dependencies perform significantly better on these tasks. Therefore, these datasets are suitable for benchmarking and exploration of MP-GNNs and Graph Transformer architectures that are intended to capture LRI.

openreview.net

Recipe for a General, Powerful, Scalable Graph Transformer

Ladislav Rampasek

Mikhail Galkin

Vijay Prakash Dwivedi

Anh Tuan Luu

Guy Wolf

Dominique Beaini

We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art result… (voir plus)s on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being

openreview.net

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Dominique Beaini

Biographie

Étudiants actuels

Publications

Science éclair

À l’avant-garde d’une nouvelle ère

Demandes de supervision

Mots-clés populaires:

Dominique Beaini

Biographie

Étudiants actuels

Publications