Prudencio Tossou

Recently, pre-trained foundation models have shown significant advancements in multiple fields. However, the lack of datasets with labeled f… (voir plus)eatures and codebases has hindered the development of a supervised foundation model for molecular tasks. Here, we have carefully curated seven datasets specifically tailored for node- and graph-level prediction tasks to facilitate supervised learning on molecules. Moreover, to support the development of multi-task learning on our proposed datasets, we created the Graphium graph machine learning library. Our dataset collection encompasses two distinct categories. Firstly, the TOYMIX category modifies three small existing datasets with additional data for multi-task learning. Secondly, the LARGEMIX category includes four large-scale datasets with 344M graph-level data points and 409M node-level data points from ∼5M unique molecules. Finally, the ultra-large dataset contains 2,210M graph-level data points and 2,031M node-level data points coming from 86M molecules. Hence our datasets represent an order of magnitude increase in data volume compared to other 2D-GNN datasets. In addition, recognizing that molecule-related tasks often span multiple levels, we have designed our library to explicitly support multi-tasking, offering a diverse range of multi-level representations, i.e., representations at the graph, node, edge, and node-pair level. We equipped the library with an extensive collection of models and features to cover different levels of molecule analysis. By combining our curated datasets with this versatile library, we aim to accelerate the development of molecule foundation models. Datasets and code are available at https://github.com/datamol-io/graphium.

2024-05-06

International Conference on Learning Representations (Accept (poster))

doi.org

openreview.net

Role of Structural and Conformational Diversity for Machine Learning Potentials

Nikhil Shenoy

Prudencio Tossou

Emmanuel Noutahi

Hadrien Mary

Dominique Beaini

Jiarui Ding

In the field of Machine Learning Interatomic Potentials (MLIPs), understanding the intricate relationship between data biases, specifically … (voir plus)conformational and structural diversity, and model generalization is critical in improving the quality of Quantum Mechanics (QM) data generation efforts. We investigate these dynamics through two distinct experiments: a fixed budget one, where the dataset size remains constant, and a fixed molecular set one, which focuses on fixed structural diversity while varying conformational diversity. Our results reveal nuanced patterns in generalization metrics. Notably, for optimal structural and conformational generalization, a careful balance between structural and conformational diversity is required, but existing QM datasets do not meet that trade-off. Additionally, our results highlight the limitation of the MLIP models at generalizing beyond their training distribution, emphasizing the importance of defining applicability domain during model deployment. These findings provide valuable insights and guidelines for QM data generation efforts.

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Science (poster)

doi.org

openreview.net

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets (Ultra Large Dataset)

Dominique Beaini

Shenyang Huang

Joao Alex Cunha

Zhiyi Li

Gabriela Moisescu-Pareja

Oleksandr Dymov

Samuel Maddrell-Mander

Callum McLean

Ali Parviz

Luis T. Díaz Müller

Jama Hussein Mohamud

Michael Craig

Cristian Gabellini

Jian Tang … (voir 8 de plus)

Christopher G. Morris

Mirco Ravanelli

Guy Wolf

Prudencio Tossou

Hadrien Mary

Błażej Banaszewski

Chad Martin

Dominic Masters

2023-09-21

Zenodo (inconnu)

doi.org

MOT: A Multi-Omics Transformer for Multiclass Classification Tumour Types Predictions

Mazid Osseni

Prudencio Tossou

Franccois Laviolette

J. Corbeil

2023-02-15

Proceedings of the 16th International Joint Conference on Biomedical Engineering Systems and Technologies (publié)

doi.org

3D Infomax improves GNNs for Molecular Property Prediction

Hannes Stärk

Dominique Beaini

Gabriele Corso

Prudencio Tossou

Christian Dallago

Stephan Günnemann

Pietro Lio

Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Including 3D mol… (voir plus)ecular structure as input to learned models improves their predictions for many molecular properties. However, this information is infeasible to compute at the scale required by most real-world applications. We propose pre-training a model to understand the geometry of molecules given only their 2D molecular graph. Using methods from self-supervised learning, we maximize the mutual information between a 3D summary vector and the representations of a Graph Neural Network (GNN) such that they contain latent 3D information. During fine-tuning on molecules with unknown geometry, the GNN still generates implicit 3D information and can use it to inform downstream tasks. We show that 3D pre-training provides significant improvements for a wide range of molecular properties, such as a 22% average MAE reduction on eight quantum mechanical properties. Crucially, the learned representations can be effectively transferred between datasets with vastly different molecules.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (publié)

proceedings.mlr.press

Rethinking Graph Transformers with Spectral Attention

William L. Hamilton

In recent years, the Transformer architecture has proven to be very successful in sequence processing, but its application to other data str… (voir plus)uctures, such as graphs, has remained limited due to the difficulty of properly defining positions. Here, we present the

2020-12-31

Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (publié)

doi.org

openreview.net

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Prudencio Tossou

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Prudencio Tossou

Publications