Chenghao Liu

2024-03-08

ICLR.cc/2024/Workshop_Proposals (published)

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-02-09

ArXiv (preprint)

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-02-09

ArXiv (preprint)

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-02-09

ArXiv (preprint)

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-02-09

ArXiv (preprint)

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities

Cheng-Hao Liu

Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-… (see more)body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant

2024-02-09

ArXiv (preprint)

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Dinghuai Zhang

Ricky T. Q. Chen

Cheng-Hao Liu

Aaron Courville

2024-01-16

ICLR.cc/2024/Conference (poster)

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Generation.

James Vuckovic

Eric Laufer

Cheng-Hao Liu

Michael M. Bronstein

2024-01-01

Neural Information Processing Systems (published)

dblp.uni-trier.de

A community effort in SARS-CoV-2 drug discovery.

Johannes Schimunek

Philipp Seidl

Katarina Elez

Tim Hempel

Tuan Le

Frank Noé

Simon Olsson

Lluís Raich

Robin Winter

Hatice Gokcan

Filipp Gusev

Evgeny M. Gutkin

Olexandr Isayev

Maria G. Kurnikova

Chamali H. Narangoda

Roman Zubatyuk

Ivan P. Bosko

Konstantin V. Furs

Anna D. Karpenko

Yury V. Kornoushenko … (see 133 more)

Mikita Shuldau

Artsemi Yushkevich

Mohammed B. Benabderrahmane

Patrick Bousquet‐Melou

Ronan Bureau

Beatrice Charton

Bertrand C. Cirou

Gérard Gil

William J. Allen

Suman Sirimulla

Stanley Watowich

Nick Antonopoulos

Nikolaos Epitropakis

Agamemnon Krasoulis

Vassilis Pitsikalis

Stavros Theodorakis

Igor Kozlovskii

Anton Maliutin

Alexander Medvedev

Petr Popov

Mark Zaretckii

Hamid Eghbal‐Zadeh

Christina Halmich

Sepp Hochreiter

Andreas Mayr

Peter Ruch

Michael Widrich

Francois Berenger

Ashutosh Kumar

Yoshihiro Yamanishi

Kam Y. J. Zhang

Emmanuel Bengio

Moksh J. Jain

Maksym Korablyov

Cheng-Hao Liu

Gilles Marcou

M. Gilles

Enrico Glaab

Kelly Barnsley

Suhasini M. Iyengar

Mary Jo Ondrechen

V. Joachim Haupt

Florian Kaiser

Michael Schroeder

Luisa Pugliese

Simone Albani

Christina Athanasiou

Andrea Beccari

Paolo Carloni

Giulia D'Arrigo

Eleonora Gianquinto

Jonas Goßen

Anton Hanke

Benjamin P. Joseph

Daria B. Kokh

Sandra Kovachka

Candida Manelfi

Goutam Mukherjee

Abraham Muñiz‐Chicharro

Francesco Musiani

Ariane Nunes‐Alves

Giulia Paiardi

Giulia Rossetti

S. Kashif Sadiq

Francesca Spyrakis

Carmine Talarico

Alexandros Tsengenes

Rebecca C. Wade

Conner Copeland

Jeremiah Gaiser

Daniel R. Olson

Amitava Roy

Vishwesh Venkatraman

Travis J. Wheeler

Haribabu Arthanari

Klara Blaschitz

Marco Cespugli

Vedat Durmaz

Konstantin Fackeldey

Patrick D. Fischer

Christoph Gorgulla

Christian Gruber

Karl Gruber

Michael Hetmann

Jamie E. Kinney

Krishna M. Padmanabha Das

Shreya Pandita

Amit Singh

Georg Steinkellner

Guilhem Tesseyre

Gerhard Wagner

Zi‐Fu Wang

Ryan J. Yust

Dmitry S. Druzhilovskiy

Dmitry A. Filimonov

Pavel V. Pogodin

Vladimir Poroikov

Anastassia V. Rudik

Leonid A. Stolbov

Alexander V. Veselovsky

Maria De Rosa

Giada De Simone

Maria R. Gulotta

Jessica Lombino

Nedra Mekni

Ugo Perricone

Arturo Casini

Amanda Embree

D. Benjamin Gordon

David Lei

Katelin Pratt

Christopher A. Voigt

Kuang‐Yu Chen

Yves Jacob

Tim Krischuns

Pierre Lafaye

Agnès Zettor

M. Luis Rodríguez

Kris M. White

Daren Fearon

Frank Von Delft

Martin A. Walsh

Dragos Horvath

Charles L. Brooks

Babak Falsafi

Bryan Ford

Adolfo García‐Sastre

Sang Yup Lee

Nadia Naffakh

Alexandre Varnek

Günter Klambauer

Thomas M. Hermans

The COVID-19 pandemic continues to pose a substantial threat to human lives and is likely to do so for years to come. Despite the availabili… (see more)ty of vaccines, searching for efficient small-molecule drugs that are widely available, including in low- and middle-income countries, is an ongoing challenge. In this work, we report the results of an open science community effort, the "Billion molecules against Covid-19 challenge", to identify small-molecule inhibitors against SARS-CoV-2 or relevant human receptors. Participating teams used a wide variety of computational methods to screen a minimum of 1 billion virtual molecules against 6 protein targets. Overall, 31 teams participated, and they suggested a total of 639,024 molecules, which were subsequently ranked to find 'consensus compounds'. The organizing team coordinated with various contract research organizations (CROs) and collaborating institutions to synthesize and test 878 compounds for biological activity against proteases (Nsp5, Nsp3, TMPRSS2), nucleocapsid N, RdRP (only the Nsp12 domain), and (alpha) spike protein S. Overall, 27 compounds with weak inhibition/binding were experimentally identified by binding-, cleavage-, and/or viral suppression assays and are presented here. Open science approaches such as the one presented here contribute to the knowledge base of future drug discovery efforts in finding better SARS-CoV-2 treatments.

2023-11-14

Molecular informatics (published)

Multi-Fidelity Active Learning with GFlowNets

Alex Hernandez-Garcia

Nikita Saxena

Moksh J. Jain

Cheng-Hao Liu

In the last decades, the capacity to generate large amounts of data in science and engineering applications has been growing steadily. Meanw… (see more)hile, the progress in machine learning has turned it into a suitable tool to process and utilise the available data. Nonetheless, many relevant scientific and engineering problems present challenges where current machine learning methods cannot yet efficiently leverage the available data and resources. For example, in scientific discovery, we are often faced with the problem of exploring very large, high-dimensional spaces, where querying a high fidelity, black-box objective function is very expensive. Progress in machine learning methods that can efficiently tackle such problems would help accelerate currently crucial areas such as drug and materials discovery. In this paper, we propose the use of GFlowNets for multi-fidelity active learning, where multiple approximations of the black-box function are available at lower fidelity and cost. GFlowNets are recently proposed methods for amortised probabilistic inference that have proven efficient for exploring large, high-dimensional spaces and can hence be practical in the multi-fidelity setting too. Here, we describe our algorithm for multi-fidelity active learning with GFlowNets and evaluate its performance in both well-studied synthetic tasks and practically relevant applications of molecular discovery. Our results show that multi-fidelity active learning with GFlowNets can efficiently leverage the availability of multiple oracles with different costs and fidelities to accelerate scientific discovery and engineering design.

2023-10-27

NeurIPS.cc/2023/Workshop/ReALML (published)

Towards equilibrium molecular conformation generation with GFlowNets

Alexandra Volokhova

Michał Koziarski

Alex Hernandez-Garcia

Cheng-Hao Liu

Santiago Miret

Pablo Lemos

Luca Thiede

Zichao Yan

Alan Aspuru-Guzik

Sampling diverse, thermodynamically feasible molecular conformations plays a crucial role in predicting properties of a molecule. In this pa… (see more)per we propose to use GFlowNet for sampling conformations of small molecules from the Boltzmann distribution, as determined by the molecule's energy. The proposed approach can be used in combination with energy estimation methods of different fidelity and discovers a diverse set of low-energy conformations for highly flexible drug-like molecules. We demonstrate that GFlowNet can reproduce molecular potential energy surfaces by sampling proportionally to the Boltzmann distribution.

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (poster)

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Dinghuai Zhang

Ricky T. Q. Chen

Cheng-Hao Liu

Aaron Courville

We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine lear… (see more)ning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional"flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods.

2023-10-04

ArXiv (preprint)