Chenghao Liu

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation

Guillaume Huguet

James Vuckovic

Kilian Fatras

Éric Thibodeau-Laufer

Cheng-Hao Liu

Michael Bronstein

Avishek Joey Bose

2023-12-31

Advances in Neural Information Processing Systems 37 (publié)

doi.org

openreview.net

A community effort in SARS-CoV-2 drug discovery.

Johannes Schimunek

Philipp Seidl

Katarina Elez

Tim Hempel

Tuan Le

Frank Noé

Simon Olsson

Lluís Raich

Robin Winter

Hatice Gokcan

Filipp Gusev

Evgeny M. Gutkin

Olexandr Isayev

Maria G. Kurnikova

Chamali H. Narangoda

Roman Zubatyuk

Ivan P. Bosko

Konstantin V. Furs

Anna D. Karpenko

Yury V. Kornoushenko … (voir 133 de plus)

Mikita Shuldau

Artsemi Yushkevich

Mohammed B. Benabderrahmane

Patrick Bousquet‐Melou

Ronan Bureau

Beatrice Charton

Bertrand C. Cirou

Gérard Gil

William J. Allen

Suman Sirimulla

Stanley Watowich

Nick Antonopoulos

Nikolaos Epitropakis

Agamemnon Krasoulis

Vassilis Pitsikalis

Stavros Theodorakis

Igor Kozlovskii

Anton Maliutin

Alexander Medvedev

Petr Popov

Mark Zaretckii

Hamid Eghbal‐Zadeh

Christina Halmich

Sepp Hochreiter

Andreas Mayr

Peter Ruch

Michael Widrich

Francois Berenger

Ashutosh Kumar

Yoshihiro Yamanishi

Kam Y. J. Zhang

Emmanuel Bengio

Yoshua Bengio

Moksh J. Jain

Maksym Korablyov

Cheng-Hao Liu

Gilles Marcou

M. Gilles

Enrico Glaab

Kelly Barnsley

Suhasini M. Iyengar

Mary Jo Ondrechen

V. Joachim Haupt

Florian Kaiser

Michael Schroeder

Luisa Pugliese

Simone Albani

Christina Athanasiou

Andrea Beccari

Paolo Carloni

Giulia D'Arrigo

Eleonora Gianquinto

Jonas Goßen

Anton Hanke

Benjamin P. Joseph

Daria B. Kokh

Sandra Kovachka

Candida Manelfi

Goutam Mukherjee

Abraham Muñiz‐Chicharro

Francesco Musiani

Ariane Nunes‐Alves

Giulia Paiardi

Giulia Rossetti

S. Kashif Sadiq

Francesca Spyrakis

Carmine Talarico

Alexandros Tsengenes

Rebecca C. Wade

Conner Copeland

Jeremiah Gaiser

Daniel R. Olson

Amitava Roy

Vishwesh Venkatraman

Travis J. Wheeler

Haribabu Arthanari

Klara Blaschitz

Marco Cespugli

Vedat Durmaz

Konstantin Fackeldey

Patrick D. Fischer

Christoph Gorgulla

Christian Gruber

Karl Gruber

Michael Hetmann

Jamie E. Kinney

Krishna M. Padmanabha Das

Shreya Pandita

Amit Singh

Georg Steinkellner

Guilhem Tesseyre

Gerhard Wagner

Zi‐Fu Wang

Ryan J. Yust

Dmitry S. Druzhilovskiy

Dmitry A. Filimonov

Pavel V. Pogodin

Vladimir Poroikov

Anastassia V. Rudik

Leonid A. Stolbov

Alexander V. Veselovsky

Maria De Rosa

Giada De Simone

Maria R. Gulotta

Jessica Lombino

Nedra Mekni

Ugo Perricone

Arturo Casini

Amanda Embree

D. Benjamin Gordon

David Lei

Katelin Pratt

Christopher A. Voigt

Kuang‐Yu Chen

Yves Jacob

Tim Krischuns

Pierre Lafaye

Agnès Zettor

M. Luis Rodríguez

Kris M. White

Daren Fearon

Frank Von Delft

Martin A. Walsh

Dragos Horvath

Charles L. Brooks

Babak Falsafi

Bryan Ford

Adolfo García‐Sastre

Sang Yup Lee

Nadia Naffakh

Alexandre Varnek

Günter Klambauer

Thomas M. Hermans

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (voir plus)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.

2023-11-13

Molecular informatics (publié)

doi.org

Towards equilibrium molecular conformation generation with GFlowNets

Alexandra Volokhova

Michał Koziarski

Alex Hernández-García

Cheng-Hao Liu

Santiago Miret

Pablo Lemos

Luca Thiede

Zichao Yan

Alán Aspuru-Guzik

Yoshua Bengio

Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (voir plus)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

2023-10-26

NeurIPS.cc/2023/Workshop/AI4Mat (poster)

doi.org

openreview.net

Thompson Sampling for Improved Exploration in GFlowNets

Jarrid Rector-Brooks

Kanika Madan

Moksh J. Jain

Maksym Korablyov

Cheng-Hao Liu

A. Chandar

Nikolay Malkin

Yoshua Bengio

Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over composition… (voir plus)al objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.

2023-06-18

ICML.cc/2023/Workshop/SPIGM (poster)

doi.org

openreview.net

GFlowNets for AI-Driven Scientific Discovery

Moksh Jain

Tristan Deleu

Jason Hartford

Cheng-Hao Liu

Alex Hernández-García

Yoshua Bengio

Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the p… (voir plus)ace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key challenge for current machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating reducible (epistemic) uncertainty and generating sets of diverse and informative experiments to perform. This motivated a new probabilistic machine learning framework called GFlowNets, which can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. GFlowNets learn to sample from a distribution given indirectly by a reward function corresponding to an unnormalized probability, which enables sampling diverse, high-reward candidates. GFlowNets can also be used to form efficient and amortized Bayesian posterior estimators for causal models conditioned on the already acquired experimental data. Having such posterior models can then provide estimators of epistemic uncertainty and information gain that can drive an experimental design policy. Altogether, here we will argue that GFlowNets can become a valuable tool for AI-driven scientific discovery, especially in scenarios of very large candidate spaces where we have access to cheap but inaccurate measurements or to expensive but accurate measurements. This is a common setting in the context of drug and material discovery, which we use as examples throughout the paper.

2022-12-31

Digital Discovery (publié)

doi.org

arxiv.org

RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Yoshua Bengio

Marwin Segler

2022-04-21

Journal of Chemical Information and Modeling (publié)

doi.org

E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN

Moksh J. Jain

Cheng-Hao Liu

Michael M. Bronstein

Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (voir plus)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.

2022-04-04

ICLR.cc/2022/Workshop/MLDD (poster)

openreview.net

RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design

Cheng-Hao Liu

Maksym Korablyov

Stanisław Jastrzębski

Paweł Włodarczyk-Pruszyński

Yoshua Bengio

Marwin Segler

De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search … (voir plus)process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a

2020-11-24

ArXiv (prépublication)

arxiv.org

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Chenghao Liu

Billets de blogue

Publications

Mila sur Udemy

Désinformation 2.0 : quand l’IA brouille nos ondes

Publications du Fellowship en politiques de l'IA

Mots-clés populaires:

Chenghao Liu

Billets de blogue

Publications