Portrait of Chenghao Liu is unavailable

Chenghao Liu

Collaborating Alumni
Supervisor
Research Topics
Generative Models
Molecular Modeling

Publications

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation
A community effort in SARS-CoV-2 drug discovery.
Johannes Schimunek
Philipp Seidl
Katarina Elez
Tim Hempel
Tuan Le
Frank Noé
Simon Olsson
Lluís Raich
Robin Winter
Hatice Gokcan
Filipp Gusev
Evgeny M. Gutkin
Olexandr Isayev
Maria G. Kurnikova
Chamali H. Narangoda
Roman Zubatyuk
Ivan P. Bosko
Konstantin V. Furs
Anna D. Karpenko
Yury V. Kornoushenko … (see 133 more)
Mikita Shuldau
Artsemi Yushkevich
Mohammed B. Benabderrahmane
Patrick Bousquet‐Melou
Ronan Bureau
Beatrice Charton
Bertrand C. Cirou
Gérard Gil
William J. Allen
Suman Sirimulla
Stanley Watowich
Nick Antonopoulos
Nikolaos Epitropakis
Agamemnon Krasoulis
Vassilis Pitsikalis
Stavros Theodorakis
Igor Kozlovskii
Anton Maliutin
Alexander Medvedev
Petr Popov
Mark Zaretckii
Hamid Eghbal‐Zadeh
Christina Halmich
Sepp Hochreiter
Andreas Mayr
Peter Ruch
Michael Widrich
Francois Berenger
Ashutosh Kumar
Yoshihiro Yamanishi
Kam Y. J. Zhang
Moksh J. Jain
Cheng-Hao Liu
Gilles Marcou
M. Gilles
Enrico Glaab
Kelly Barnsley
Suhasini M. Iyengar
Mary Jo Ondrechen
V. Joachim Haupt
Florian Kaiser
Michael Schroeder
Luisa Pugliese
Simone Albani
Christina Athanasiou
Andrea Beccari
Paolo Carloni
Giulia D'Arrigo
Eleonora Gianquinto
Jonas Goßen
Anton Hanke
Benjamin P. Joseph
Daria B. Kokh
Sandra Kovachka
Candida Manelfi
Goutam Mukherjee
Abraham Muñiz‐Chicharro
Francesco Musiani
Ariane Nunes‐Alves
Giulia Paiardi
Giulia Rossetti
S. Kashif Sadiq
Francesca Spyrakis
Carmine Talarico
Alexandros Tsengenes
Rebecca C. Wade
Conner Copeland
Jeremiah Gaiser
Daniel R. Olson
Amitava Roy
Vishwesh Venkatraman
Travis J. Wheeler
Haribabu Arthanari
Klara Blaschitz
Marco Cespugli
Vedat Durmaz
Konstantin Fackeldey
Patrick D. Fischer
Christoph Gorgulla
Christian Gruber
Karl Gruber
Michael Hetmann
Jamie E. Kinney
Krishna M. Padmanabha Das
Shreya Pandita
Amit Singh
Georg Steinkellner
Guilhem Tesseyre
Gerhard Wagner
Zi‐Fu Wang
Ryan J. Yust
Dmitry S. Druzhilovskiy
Dmitry A. Filimonov
Pavel V. Pogodin
Vladimir Poroikov
Anastassia V. Rudik
Leonid A. Stolbov
Alexander V. Veselovsky
Maria De Rosa
Giada De Simone
Maria R. Gulotta
Jessica Lombino
Nedra Mekni
Ugo Perricone
Arturo Casini
Amanda Embree
D. Benjamin Gordon
David Lei
Katelin Pratt
Christopher A. Voigt
Kuang‐Yu Chen
Yves Jacob
Tim Krischuns
Pierre Lafaye
Agnès Zettor
M. Luis Rodríguez
Kris M. White
Daren Fearon
Frank Von Delft
Martin A. Walsh
Dragos Horvath
Charles L. Brooks
Babak Falsafi
Bryan Ford
Adolfo García‐Sastre
Sang Yup Lee
Nadia Naffakh
Alexandre Varnek
Günter Klambauer
Thomas M. Hermans
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (see more)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.
Towards equilibrium molecular conformation generation with GFlowNets
Cheng-Hao Liu
Santiago Miret
Luca Thiede
Alán Aspuru-Guzik
Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (see more)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.
Thompson Sampling for Improved Exploration in GFlowNets
Generative flow networks (GFlowNets) are amortized variational inference algorithms that treat sampling from a distribution over composition… (see more)al objects as a sequential decision-making problem with a learnable action policy. Unlike other algorithms for hierarchical sampling that optimize a variational bound, GFlowNet algorithms can stably run off-policy, which can be advantageous for discovering modes of the target distribution. Despite this flexibility in the choice of behaviour policy, the optimal way of efficiently selecting trajectories for training has not yet been systematically explored. In this paper, we view the choice of trajectories for training as an active learning problem and approach it using Bayesian techniques inspired by methods for multi-armed bandits. The proposed algorithm, Thompson sampling GFlowNets (TS-GFN), maintains an approximate posterior distribution over policies and samples trajectories from this posterior for training. We show in two domains that TS-GFN yields improved exploration and thus faster convergence to the target distribution than the off-policy exploration strategies used in past work.
GFlowNets for AI-Driven Scientific Discovery
Tackling the most pressing problems for humanity, such as the climate crisis and the threat of global pandemics, requires accelerating the p… (see more)ace of scientific discovery. While science has traditionally relied on trial and error and even serendipity to a large extent, the last few decades have seen a surge of data-driven scientific discoveries. However, in order to truly leverage large-scale data sets and high-throughput experimental setups, machine learning methods will need to be further improved and better integrated in the scientific discovery pipeline. A key challenge for current machine learning methods in this context is the efficient exploration of very large search spaces, which requires techniques for estimating reducible (epistemic) uncertainty and generating sets of diverse and informative experiments to perform. This motivated a new probabilistic machine learning framework called GFlowNets, which can be applied in the modeling, hypotheses generation and experimental design stages of the experimental science loop. GFlowNets learn to sample from a distribution given indirectly by a reward function corresponding to an unnormalized probability, which enables sampling diverse, high-reward candidates. GFlowNets can also be used to form efficient and amortized Bayesian posterior estimators for causal models conditioned on the already acquired experimental data. Having such posterior models can then provide estimators of epistemic uncertainty and information gain that can drive an experimental design policy. Altogether, here we will argue that GFlowNets can become a valuable tool for AI-driven scientific discovery, especially in scenarios of very large candidate spaces where we have access to cheap but inaccurate measurements or to expensive but accurate measurements. This is a common setting in the context of drug and material discovery, which we use as examples throughout the paper.
RetroGNN: Fast Estimation of Synthesizability for Virtual Screening and De Novo Design by Learning from Slow Retrosynthesis Software
Cheng-Hao Liu
Stanisław Jastrzębski
Paweł Włodarczyk-Pruszyński
Marwin Segler
E VALUATING G ENERALIZATION IN GF LOW N ETS FOR M OLECULE D ESIGN
Moksh J. Jain
Cheng-Hao Liu
Michael M. Bronstein
Deep learning bears promise for drug discovery problems such as de novo molecular design. Generating data to train such models is a costly a… (see more)nd time-consuming process, given the need for wet-lab experiments or expensive simulations. This problem is compounded by the notorious data-hungriness of machine learning algorithms. In small molecule generation the recently proposed GFlowNet method has shown good performance in generating diverse high-scoring candidates, and has the interesting advantage of being an off-policy offline method. Finding an appropriate generalization evaluation metric for such models, one predictive of the desired search performance (i.e. finding high-scoring diverse candidates), will help guide online data collection for such an algorithm. In this work, we develop techniques for evaluating GFlowNet performance on a test set, and identify the most promising metric for predicting generalization. We present empirical results on several small-molecule design tasks in drug discovery, for several GFlowNet training setups, and we find a metric strongly correlated with diverse high-scoring batch generation. This metric should be used to identify the best generative model from which to sample batches of molecules to be evaluated.
RetroGNN: Approximating Retrosynthesis by Graph Neural Networks for De Novo Drug Design
Cheng-Hao Liu
Stanisław Jastrzębski
Paweł Włodarczyk-Pruszyński
Marwin Segler
De novo molecule generation often results in chemically unfeasible molecules. A natural idea to mitigate this problem is to bias the search … (see more)process towards more easily synthesizable molecules using a proxy for synthetic accessibility. However, using currently available proxies still results in highly unrealistic compounds. We investigate the feasibility of training deep graph neural networks to approximate the outputs of a retrosynthesis planning software, and their use to bias the search process. We evaluate our method on a benchmark involving searching for drug-like molecules with antibiotic properties. Compared to enumerating over five million existing molecules from the ZINC database, our approach finds molecules predicted to be more likely to be antibiotics while maintaining good drug-like properties and being easily synthesizable. Importantly, our deep neural network can successfully filter out hard to synthesize molecules while achieving a