Portrait de Chenghao Liu n'est pas disponible

Chenghao Liu

Collaborateur·rice alumni
Superviseur⋅e principal⋅e
Sujets de recherche
Modèles génératifs
Modélisation moléculaire

Publications

Integrating Generative and Experimental Platforms for Biomolecular Design
Cheng-Hao Liu
Soojung Yang
Sidney L Lisanza
Francesca-Zhoufan Li
Hannes Stärk
Jacob Gershon
Lauren Hong
Pranam Chatterjee
Tommi Jaakkola
Regina Barzilay
David Baker
Frances H. Arnold
Biomolecular design, through artificial engineering of proteins, ligands, and nucleic acids, holds immense promise in addressing pressing me… (voir plus)dical, industrial, and environmental challenges. While generative machine learning has shown significant potential in this area, a palpable disconnect exists with experimental biology: many ML research efforts prioritize static benchmark performance, potentially sidelining impactful biological applications. This workshop seeks to bridge this gap by bringing computationalists and experimentalists together, catalyzing a deeper interdisciplinary discourse. Together, we will explore the strengths and challenges of generative ML in biology, experimental integration of generative ML, and biological problems ready for ML. To attract high-quality and diverse research, we partnered with Nature Biotechnology for a special collection, and we created dedicated tracks for in-silico ML research and hybrid ML-experimental biology research. Our lineup features emerging leaders as speakers and renowned scientists as panelists, encapsulating a spectrum from high-throughput experimentation and computational biology to generative ML. With a diverse organizing team and backed by industry sponsors, we dedicate the workshop to pushing the boundaries of ML's role in biology.
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Zhangzhi Peng
Zachary Quinn
Cheng-Hao Liu
Nouha Dziri
Michael M. Bronstein
Pranam Chatterjee
Alexander Tong
Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction
Zhangzhi Peng
Zachary Quinn
Cheng-Hao Liu
Nouha Dziri
Michael M. Bronstein
Pranam Chatterjee
Alexander Tong
Generative modeling of discrete data underlies important applications spanning text-based agents like ChatGPT to the design of the very buil… (voir plus)ding blocks of life in protein sequences. However, application domains need to exert control over the generated data by steering the generative process - typically via RLHF - to satisfy a specified property, reward, or affinity metric. In this paper, we study the problem of steering Masked Diffusion Models (MDMs), a recent class of discrete diffusion models that offer a compelling alternative to traditional autoregressive models. We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference by learning to sample from a target Bayesian posterior. Our DDPP framework leads to a family of three novel objectives that are all simulation-free, and thus scalable while applying to general non-differentiable reward functions. Empirically, we instantiate DDPP by steering MDMs to perform class-conditional pixel-level image modeling, RLHF-based alignment of MDMs using text-based rewards, and finetuning protein language models to generate more diverse secondary structures and shorter proteins. We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
RGFN: Synthesizable Molecular Generation Using GFlowNets
Michał Koziarski
Andrei Rekesh
Dmytro Shevchuk
Almer M. van der Sloot
Piotr Gaiński
Cheng-Hao Liu
Mike Tyers
Robert A. Batey
Generative Active Learning for the Search of Small-molecule Protein Binders
Maksym Korablyov
Cheng-Hao Liu
Moksh J. Jain
Almer M. van der Sloot
Eric Jolicoeur
Edward Ruediger
Andrei Cristian Nica
Kostiantyn Lapchevskyi
Daniel St-Cyr
Doris Alexandra Schuetz
Victor I Butoi
Simon R. Blackburn
Hadi Nekoei
Sai Krishna Gottipati
Prateek Gupta
Ladislav Rampášek … (voir 14 de plus)
Sasikanth Avancha
William L. Hamilton
Brooks Paige
Sanchit Misra
Stanisław Jastrzębski
Bharat Kaul
José Miguel Hernández-Lobato
Marwin Segler
Michael M. Bronstein
Anne Marinier
Mike Tyers
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (voir plus)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (voir plus)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient---and no data samples---to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is *simulation-free*, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant
Integrating Generative and Experimental Platforms or Biomolecular Design
Cheng-Hao Liu
Jason Yim
Soojung Yang
Sidney Lisanza
Francesca-Zhoufan Li
Pranam Chatterjee
Tommi Jaakkola
Regina Barzilay
David Baker
Frances H. Arnold
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (voir plus)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (voir plus)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization
Ricky T. Q. Chen
Cheng-Hao Liu
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Generation.
James Vuckovic
Kilian FATRAS
Eric Laufer
Riashat Islam
Cheng-Hao Liu
Michael M. Bronstein
Alexander Tong
A community effort in SARS-CoV-2 drug discovery.
Johannes Schimunek
Philipp Seidl
Katarina Elez
Tim Hempel
Tuan Le
Frank Noé
Simon Olsson
Lluís Raich
Robin Winter
Hatice Gokcan
Filipp Gusev
Evgeny M. Gutkin
Olexandr Isayev
Maria G. Kurnikova
Chamali H. Narangoda
Roman Zubatyuk
Ivan P. Bosko
Konstantin V. Furs
Anna D. Karpenko
Yury V. Kornoushenko … (voir 133 de plus)
Mikita Shuldau
Artsemi Yushkevich
Mohammed B. Benabderrahmane
Patrick Bousquet‐Melou
Ronan Bureau
Beatrice Charton
Bertrand C. Cirou
Gérard Gil
William J. Allen
Suman Sirimulla
Stanley Watowich
Nick Antonopoulos
Nikolaos Epitropakis
Agamemnon Krasoulis
Vassilis Pitsikalis
Stavros Theodorakis
Igor Kozlovskii
Anton Maliutin
Alexander Medvedev
Petr Popov
Mark Zaretckii
Hamid Eghbal‐Zadeh
Christina Halmich
Sepp Hochreiter
Andreas Mayr
Peter Ruch
Michael Widrich
Francois Berenger
Ashutosh Kumar
Yoshihiro Yamanishi
Kam Y. J. Zhang
Moksh J. Jain
Maksym Korablyov
Cheng-Hao Liu
Gilles Marcou
M. Gilles
Enrico Glaab
Kelly Barnsley
Suhasini M. Iyengar
Mary Jo Ondrechen
V. Joachim Haupt
Florian Kaiser
Michael Schroeder
Luisa Pugliese
Simone Albani
Christina Athanasiou
Andrea Beccari
Paolo Carloni
Giulia D'Arrigo
Eleonora Gianquinto
Jonas Goßen
Anton Hanke
Benjamin P. Joseph
Daria B. Kokh
Sandra Kovachka
Candida Manelfi
Goutam Mukherjee
Abraham Muñiz‐Chicharro
Francesco Musiani
Ariane Nunes‐Alves
Giulia Paiardi
Giulia Rossetti
S. Kashif Sadiq
Francesca Spyrakis
Carmine Talarico
Alexandros Tsengenes
Rebecca C. Wade
Conner Copeland
Jeremiah Gaiser
Daniel R. Olson
Amitava Roy
Vishwesh Venkatraman
Travis J. Wheeler
Haribabu Arthanari
Klara Blaschitz
Marco Cespugli
Vedat Durmaz
Konstantin Fackeldey
Patrick D. Fischer
Christoph Gorgulla
Christian Gruber
Karl Gruber
Michael Hetmann
Jamie E. Kinney
Krishna M. Padmanabha Das
Shreya Pandita
Amit Singh
Georg Steinkellner
Guilhem Tesseyre
Gerhard Wagner
Zi‐Fu Wang
Ryan J. Yust
Dmitry S. Druzhilovskiy
Dmitry A. Filimonov
Pavel V. Pogodin
Vladimir Poroikov
Anastassia V. Rudik
Leonid A. Stolbov
Alexander V. Veselovsky
Maria De Rosa
Giada De Simone
Maria R. Gulotta
Jessica Lombino
Nedra Mekni
Ugo Perricone
Arturo Casini
Amanda Embree
D. Benjamin Gordon
David Lei
Katelin Pratt
Christopher A. Voigt
Kuang‐Yu Chen
Yves Jacob
Tim Krischuns
Pierre Lafaye
Agnès Zettor
M. Luis Rodríguez
Kris M. White
Daren Fearon
Frank Von Delft
Martin A. Walsh
Dragos Horvath
Charles L. Brooks
Babak Falsafi
Bryan Ford
Adolfo García‐Sastre
Sang Yup Lee
Nadia Naffakh
Alexandre Varnek
Günter Klambauer
Thomas M. Hermans
The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (voir plus)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.