Almer Van Der Sloot

RGFN: Synthesizable Molecular Generation Using GFlowNets

Michał Koziarski

Andrei Rekesh

Dmytro Shevchuk

Almer Van Der Sloot

Piotr Gainski

Yoshua Bengio

Cheng-Hao Liu

Mike Tyers

Robert A. Batey

Generative models hold great promise for small molecule discovery, significantly increasing the size of search space compared to traditional… (see more) in silico screening libraries. However, most existing machine learning methods for small molecule generation suffer from poor synthesizability of candidate compounds, making experimental validation difficult. In this paper we propose Reaction-GFlowNet (RGFN), an extension of the GFlowNet framework that operates directly in the space of chemical reactions, thereby allowing out-of-the-box synthesizability while maintaining comparable quality of generated candidates. We demonstrate that with the proposed set of reactions and building blocks, it is possible to obtain a search space of molecules orders of magnitude larger than existing screening libraries coupled with low cost of synthesis. We also show that the approach scales to very large fragment libraries, further increasing the number of potential molecules. We demonstrate the effectiveness of the proposed approach across a range of oracle models, including pretrained proxy models and GPU-accelerated docking.

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Protein Language Models: Is Scaling Necessary?

Quentin Fournier

Robert M. Vernon

Almer Van Der Sloot

Benjamin Schulz

Sarath Chandar

Christopher James Langmead

Public protein sequence databases contain samples from the fitness landscape explored by nature. Protein language models (pLMs) pre-trained … (see more)on these sequences aim to capture this landscape for tasks like property prediction and protein design. Following the same trend as in natural language processing, pLMs have continuously been scaled up. However, the premise that scale leads to better performance assumes that source databases provide an accurate representation of the underlying fitness landscape, which is likely false. By developing an efficient codebase, designing a modern architecture, and addressing data quality concerns such as sample bias, we introduce AMPLIFY, a best-in-class pLM that is orders of magnitude less expensive to train and deploy than previous models. Furthermore, to support the scientific community and democratize the training of pLMs, we have open-sourced AMPLIFY’s pre-training codebase, data, and model checkpoints.

2024-09-22

bioRxiv (preprint)

doi.org

Towards DNA-Encoded Library Generation with GFlowNets

Michał Koziarski

Mohammed Abukalam

Vedant Shah

Louis Vaillancourt

Doris Alexandra Schuetz

Moksh Jain

Almer Van Der Sloot

Mathieu Bourgey

Anne Marinier

Yoshua Bengio

2024-03-03

GEM @ International Conference on Learning Representations (poster)

doi.org

openreview.net

Generative Active Learning for the Search of Small-Molecule Protein Binders

Maksym Korablyov

Cheng-Hao Liu

Moksh Jain

Almer Van Der Sloot

Éric Jolicoeur

Edward Ruediger

Andrei Nica

Emmanuel Bengio

Kostiantyn Lapchevskyi

Daniel St-Cyr

Doris Alexandra Schuetz

Victor Ion Butoi

Saikrishna Gottipati

Prateek Gupta

Ladislav Rampasek … (see 14 more)

Sasikanth Avancha

Pierre-Luc Bacon

William Hamilton

Brooks Paige

Sanchit Misra

Stanislaw Jastrzebski

Bharat Kaul

Doina Precup

José Miguel Hernández-Lobato

Marwin Segler

Michael Bronstein

Anne Marinier

Mike Tyers

Yoshua Bengio

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (see more)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

2023-12-31

arXiv (preprint)

doi.org

arxiv.org

RECOVER identifies synergistic drug combinations in vitro through sequential model optimization

Paul Bertin

Jarrid Rector-Brooks

Deepak Sharma

Thomas Gaudelet

Andrew Anighoro

Torsten Gross

Francisco Martínez-Peña