Emmanuel Bengio

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Jinkyoo Park

Designing biological sequences with desired properties is challenging due to vast search spaces and limited evaluation budgets. Although rei… (see more)nforcement learning methods use proxy models for rapid reward evaluation, insufficient training data can cause proxy misspecification on out-of-distribution inputs. To address this, we propose a novel off-policy search,

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Alex Hernandez-Garcia

Jinkyoo Park

Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high … (see more)cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

arxiv.org

Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Mohit Pandey

Gopeshh Subbaraj

Artem Cherkasov

Martin Ester

Emmanuel Bengio

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

arxiv.org

Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Mohit Pandey

Gopeshh Subbaraj

Artem Cherkasov

Martin Ester

Emmanuel Bengio

Generative Flow Networks (GFlowNets) have recently emerged as a suitable framework for generating diverse and high-quality molecular structu… (see more)res by learning from rewards treated as unnormalized distributions. Previous works in this framework often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using drug-like molecule datasets, which teaches A-GFNs about inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further implement a goal-conditioned finetuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on a subset of ZINC dataset, and by employing robust evaluation metrics we show the effectiveness of our approach when compared to other relevant baseline methods for a wide range of drug design tasks. The code is accessible at https://github.com/diamondspark/AGFN.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Haoran He

Emmanuel Bengio

Qingpeng Cai

Ling Pan

The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sampl… (see more)e objects with probability proportional to an unnormalized reward function. GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. A number of recent works explored connections between GFlowNets and maximum entropy (MaxEnt) RL, which incorporates entropy regularization into the standard RL objective. However, the relationship between GFlowNets and standard RL remains largely unexplored, despite the inherent similarities in their sequential decision-making nature. While GFlowNets can discover diverse solutions through specialized flow-matching objectives, connecting them to standard RL can simplify their implementation through well-established RL principles and also improve RL’s capabilities in diverse solution discovery (a critical requirement in many real-world applications), and bridging this gap can further unlock the potential of both fields. In this paper, we bridge this gap by revealing a fundamental connection between GFlowNets and one of the most basic components of RL – policy evaluation. Surprisingly, we find that the value function obtained from evaluating a uniform policy is closely associated with the flow functions in GFlowNets. Building upon these insights, we introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets based on simply evaluating a fixed random policy, offering a new perspective. Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non-MaxEnt) RL and GFlowNets.

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Haoran He

Emmanuel Bengio

Qingpeng Cai 0001

Ling Pan

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

arxiv.org

Torsional-GFN: a conditional conformation generator for small molecules

Alexandra Volokhova

Lena Nehale Ezzine

Piotr Gai'nski

Alex Hernandez-Garcia

Generating stable molecular conformations is crucial in several drug discovery applications, such as estimating the binding affinity of a mo… (see more)lecule to a target. Recently, generative machine learning methods have emerged as a promising, more efficient method than molecular dynamics for sampling of conformations from the Boltzmann distribution. In this paper, we introduce Torsional-GFN, a conditional GFlowNet specifically designed to sample conformations of molecules proportionally to their Boltzmann distribution, using only a reward function as training signal. Conditioned on a molecular graph and its local structure (bond lengths and angles), Torsional-GFN samples rotations of its torsion angles. Our results demonstrate that Torsional-GFN is able to sample conformations approximately proportional to the Boltzmann distribution for multiple molecules with a single model, and allows for zero-shot generalization to unseen bond lengths and angles coming from the MD simulations for such molecules. Our work presents a promising avenue for scaling the proposed approach to larger molecular systems, achieving zero-shot generalization to unseen molecules, and including the generation of the local structure into the GFlowNet model.

2025-07-15

ArXiv (preprint)

arxiv.org

Torsional-GFN: a conditional conformation generator for small molecules

Lena Nehale Ezzine

Alex Hernandez-Garcia

2025-06-11

ICML.cc/2025/Workshop/GenBio (poster)

doi.org

openreview.net

Virtual Cells: Predict, Explain, Discover

Emmanuel Noutahi

Jason Hartford

Prudencio Tossou

Shawn Whitfield

Ali Denton

Cas Wognum

Kristina Ulicna

Jonathan Hsu

Michael Cuccarese

Emmanuel Bengio

Dominique Beaini

Christopher Gibson

Daniel Cohen

Berton Earnshaw

2025-05-20

ArXiv (preprint)

arxiv.org

Virtual Cells: Predict, Explain, Discover

Emmanuel Noutahi

Jason Hartford

Prudencio Tossou

Shawn Whitfield

Ali Denton

Cas Wognum

Kristina Ulicna

Michael Craig

Jonathan Hsu

Michael Cuccarese

Emmanuel Bengio

Dominique Beaini

Christopher Gibson

Daniel Cohen

Berton Earnshaw

2025-05-01

arXiv (published)

doi.org

arxiv.org

Solving Bayesian inverse problems with diffusion priors and off-policy RL

Moksh J. Jain

Laurence Perreault-Levasseur

Yoshua Bengio

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (see more)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)

doi.org

openreview.net

Action abstractions for amortized sampling

Oussama Boussif

Lena Nehale Ezzine

Joseph D Viviano

Michał Koziarski

Moksh J. Jain

Nikolay Malkin

Emmanuel Bengio