Maksym Korablyov

Generative Active Learning for the Search of Small-molecule Protein Binders

Maksym Korablyov

Cheng-Hao Liu

Moksh J. Jain

Almer M. van der Sloot

Eric Jolicoeur

Edward Ruediger

Andrei Cristian Nica

Emmanuel Bengio

Kostiantyn Lapchevskyi

Daniel St-Cyr

Doris Alexandra Schuetz

Victor I Butoi

Jarrid Rector-Brooks

Simon R. Blackburn

Leo Feng

Hadi Nekoei

Sai Krishna Gottipati

Priyesh Vijayan

Prateek Gupta

Ladislav Rampasek … (see 14 more)

Sasikanth Avancha

Pierre-Luc Bacon

William L. Hamilton

Brooks Paige

Sanchit Misra

Stanisław Jastrzębski

Bharat Kaul

Doina Precup

José Miguel Hernández-Lobato

Marwin Segler

Michael M. Bronstein

Anne Marinier

Mike Tyers

Yoshua Bengio

Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (see more)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.

2024-05-02

ArXiv (preprint)

doi.org

arxiv.org

A community effort in SARS-CoV-2 drug discovery.

Johannes Schimunek

Philipp Seidl

Katarina Elez

Tim Hempel

Tuan Le

Frank Noé

Simon Olsson

Lluís Raich

Robin Winter

Hatice Gokcan

Filipp Gusev

Evgeny M. Gutkin

Olexandr Isayev

Maria G. Kurnikova

Chamali H. Narangoda

Roman Zubatyuk

Ivan P. Bosko

Konstantin V. Furs

Anna D. Karpenko

Yury V. Kornoushenko … (see 133 more)

Mikita Shuldau

Artsemi Yushkevich

Mohammed B. Benabderrahmane

Patrick Bousquet‐Melou

Ronan Bureau

Beatrice Charton

Bertrand C. Cirou

Gérard Gil

William J. Allen

Suman Sirimulla

Stanley Watowich

Nick Antonopoulos

Nikolaos Epitropakis

Agamemnon Krasoulis

Vassilis Pitsikalis

Stavros Theodorakis

Igor Kozlovskii

Anton Maliutin

Alexander Medvedev

Petr Popov

Mark Zaretckii

Hamid Eghbal‐Zadeh

Christina Halmich

Sepp Hochreiter

Andreas Mayr

Peter Ruch

Michael Widrich

Francois Berenger

Ashutosh Kumar

Yoshihiro Yamanishi

Kam Y. J. Zhang

Emmanuel Bengio

Yoshua Bengio

Moksh J. Jain

Maksym Korablyov

Cheng-Hao Liu

Gilles Marcou

M. Gilles

Enrico Glaab

Kelly Barnsley

Suhasini M. Iyengar

Mary Jo Ondrechen

V. Joachim Haupt

Florian Kaiser

Michael Schroeder

Luisa Pugliese

Simone Albani

Christina Athanasiou

Andrea Beccari

Paolo Carloni

Giulia D'Arrigo

Eleonora Gianquinto

Jonas Goßen

Anton Hanke

Benjamin P. Joseph

Daria B. Kokh

Sandra Kovachka

Candida Manelfi

Goutam Mukherjee

Abraham Muñiz‐Chicharro

Francesco Musiani

Ariane Nunes‐Alves

Giulia Paiardi

Giulia Rossetti

S. Kashif Sadiq

Francesca Spyrakis

Carmine Talarico

Alexandros Tsengenes

Rebecca C. Wade

Conner Copeland

Jeremiah Gaiser

Daniel R. Olson

Amitava Roy

Vishwesh Venkatraman

Travis J. Wheeler

Haribabu Arthanari

Klara Blaschitz

Marco Cespugli

Vedat Durmaz

Konstantin Fackeldey

Patrick D. Fischer

Christoph Gorgulla

Christian Gruber

Karl Gruber

Michael Hetmann

Jamie E. Kinney

Krishna M. Padmanabha Das

Shreya Pandita

Amit Singh

Georg Steinkellner

Guilhem Tesseyre

Gerhard Wagner

Zi‐Fu Wang

Ryan J. Yust

Dmitry S. Druzhilovskiy

Dmitry A. Filimonov

Pavel V. Pogodin

Vladimir Poroikov

Anastassia V. Rudik

Leonid A. Stolbov

Alexander V. Veselovsky

Maria De Rosa

Giada De Simone

Maria R. Gulotta

Jessica Lombino

Nedra Mekni

Ugo Perricone

Arturo Casini

Amanda Embree

D. Benjamin Gordon

David Lei

Katelin Pratt

Christopher A. Voigt

Kuang‐Yu Chen

Yves Jacob

Tim Krischuns

Pierre Lafaye

Agnès Zettor

M. Luis Rodríguez

Kris M. White

Daren Fearon

Frank Von Delft

Martin A. Walsh

Dragos Horvath

Charles L. Brooks

Babak Falsafi

Bryan Ford

Adolfo García‐Sastre

Sang Yup Lee

Nadia Naffakh

Alexandre Varnek

Günter Klambauer

Thomas M. Hermans

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (see more)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.

2023-11-14

Molecular informatics (published)

doi.org

RECOVER identifies synergistic drug combinations in vitro through sequential model optimization

Paul Bertin

Jarrid Rector-Brooks

Deepak Sharma

Thomas Gaudelet

Andrew Anighoro

Torsten Gross

Francisco Martínez-Peña

Eileen L. Tang

M.S. Suraj

Cristian Regep

Jeremy B.R. Hayter

Maksym Korablyov

Nicholas Valiante

Almer Van Der Sloot

Mike Tyers

Charles E.S. Roberts

Michael M. Bronstein

Luke L. Lairson

Jake P. Taylor-King

Yoshua Bengio

2023-09-27

Cell Reports Methods (published)

doi.org

Thompson Sampling for Improved Exploration in GFlowNets

Moksh J. Jain

Cheng-Hao Liu

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over composition… (see more)al objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

2023-06-19

ICML.cc/2023/Workshop/SPIGM (poster)

doi.org

openreview.net

DEUP: Direct Epistemic Uncertainty Prediction

Moksh J. Jain

Victor I Butoi

Epistemic Uncertainty is a measure of the lack of knowledge of a learner which diminishes with more evidence. While existing work focuses on… (see more) using the variance of the Bayesian posterior due to parameter uncertainty as a measure of epistemic uncertainty, we argue that this does not capture the part of lack of knowledge induced by model misspecification. We discuss how the excess risk, which is the gap between the generalization error of a predictor and the Bayes predictor, is a sound measure of epistemic uncertainty which captures the effect of model misspecification. We thus propose a principled framework for directly estimating the excess risk by learning a secondary predictor for the generalization error and subtracting an estimate of aleatoric uncertainty, i.e., intrinsic unpredictability. We discuss the merits of this novel measure of epistemic uncertainty, and highlight how it differs from variance-based measures of epistemic uncertainty and addresses its major pitfall. Our framework, Direct Epistemic Uncertainty Prediction (DEUP) is particularly interesting in interactive learning environments, where the learner is allowed to acquire novel examples in each round. Through a wide set of experiments, we illustrate how existing methods in sequential model optimization can be improved with epistemic uncertainty estimates from DEUP, and how DEUP can be used to drive exploration in reinforcement learning. We also evaluate the quality of uncertainty estimates from DEUP for probabilistic image classification and predicting synergies of drug combinations.

2023-02-13

TMLR (accepted)

openreview.net

Learning GFlowNets from partial episodes for improved convergence and stability

Moksh J. Jain

Tom Bosc

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized … (see more)target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD(

2023-01-01

ICML (published)

doi.org

openreview.net

RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Yoshua Bengio

Marwin Segler

2022-04-22

Journal of Chemical Information and Modeling (published)

doi.org

E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN

Moksh J. Jain

Cheng-Hao Liu

Michael M. Bronstein

Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (see more)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.

2022-04-05

ICLR.cc/2022/Workshop/MLDD (poster)

openreview.net

RECOVER: sequential model optimization platform for combination drug repurposing identifies novel synergistic compounds in vitro

Paul Bertin

Jarrid Rector-Brooks

Deepak Sharma

Thomas Gaudelet

Andrew Anighoro

Torsten Gross

Francisco Martínez-Peña

Eileen L. Tang

S. SurajM

Cristian Regep

Jeremy B.R. Hayter

Maksym Korablyov

N. Valiante

Almer M. van der Sloot

Mike Tyers

Charles E.S. Roberts

Michael M. Bronstein

Luke Lee Lairson

Jake P. Taylor-King

Yoshua Bengio

2022-02-07

ArXiv (preprint)

arxiv.org

Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation

Moksh J. Jain

This paper is about the problem of learning a stochastic policy for generating an object (like a molecular graph) from a sequence of actions… (see more), such that the probability of generating an object is proportional to a given positive reward for that object. Whereas standard return maximization tends to converge to a single return-maximizing sequence, there are cases where we would like to sample a diverse set of high-return solutions. These arise, for example, in black-box function optimization when few rounds are possible, each with large batches of queries, where the batches should be diverse, e.g., in the design of new molecules. One can also see this as a problem of approximately converting an energy function to a generative distribution. While MCMC methods can achieve that, they are expensive and generally only perform local exploration. Instead, training a generative policy amortizes the cost of search during training and yields to fast generation. Using insights from Temporal Difference learning, we propose GFlowNet, based on a view of the generative process as a flow network, making it possible to handle the tricky case where different trajectories can yield the same final state, e.g., there are many ways to sequentially add atoms to generate some molecular graph. We cast the set of trajectories as a flow and convert the flow consistency equations into a learning objective, akin to the casting of the Bellman equations into Temporal Difference methods. We prove that any global minimum of the proposed objectives yields a policy which samples from the desired distribution, and demonstrate the improved performance and diversity of GFlowNet on a simple domain where there are many modes to the reward function, and on a molecule synthesis task.

openreview.net

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Yoshua Bengio

Marwin Segler

De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search … (see more)process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a

2020-11-25

ArXiv (preprint)

arxiv.org

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Maksym Korablyov

Publications

Hackathon | Building safer AI for youth mental health

Indigenous Pathfinders in AI

AI Advantage

Popular keywords:

Maksym Korablyov

Publications