Emmanuel Bengio

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Jinkyoo Park

Designing biological sequences with desired properties is challenging due to vast search spaces and limited evaluation budgets. Although rei… (see more)nforcement learning methods use proxy models for rapid reward evaluation, insufficient training data can cause proxy misspecification on out-of-distribution inputs. To address this, we propose a novel off-policy search,

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Pretraining Generative Flow Networks with Inexpensive Rewards for Molecular Graph Generation

Mohit Pandey

Gopeshh Subbaraj

Artem Cherkasov

Martin Ester

Emmanuel Bengio

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Haoran He

Emmanuel Bengio

Qingpeng Cai 0001

Ling Pan

2025-10-05

Proceedings of the 42nd International Conference on Machine Learning (published)

proceedings.mlr.press

Torsional-GFN: a conditional conformation generator for small molecules

Lena Nehale Ezzine

Alex Hernández-García

2025-06-10

ICML.cc/2025/Workshop/GenBio (poster)

doi.org

openreview.net

Virtual Cells: Predict, Explain, Discover

Emmanuel Noutahi

Jason Hartford

Prudencio Tossou

Shawn Whitfield

Ali Denton

Cas Wognum

Kristina Ulicna

Michael Craig

Jonathan Hsu

Michael Cuccarese

Emmanuel Bengio

Dominique Beaini

Christopher Gibson

Daniel Cohen

Berton Earnshaw

2025-04-30

arXiv (published)

doi.org

arxiv.org

Solving Bayesian Inverse Problems with Diffusion Priors and Off-Policy RL

Laurence Perreault-Levasseur

Yoshua Bengio

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (see more)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-05

ICLR.cc/2025/Workshop/DeLTa (poster)

doi.org

openreview.net

Action Abstractions for Amortized Sampling

Lena Nehale Ezzine

As trajectories sampled by policies used by reinforcement learning (RL) and generative flow networks (GFlowNets) grow longer, credit assignm… (see more)ent and exploration become more challenging, and the long planning horizon hinders mode discovery and generalization. The challenge is particularly pronounced in entropy-seeking RL methods, such as generative flow networks, where the agent must learn to sample from a structured distribution and discover multiple high-reward states, each of which take many steps to reach. To tackle this challenge, we propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process. Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and `chunking' them into a single action that is added to the action space. In empirical evaluation on synthetic and real-world environments, our approach demonstrates improved sample efficiency performance in discovering diverse high-reward objects, especially on harder exploration problems. We also observe that the abstracted high-order actions are interpretable, capturing the latent structure of the reward landscape of the action space. This work provides a cognitively motivated approach to action abstraction in RL and is the first demonstration of hierarchical planning in amortized sequential sampling.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

Adaptive Teachers for Amortized Samplers

Sungsoo Ahn

Jinkyoo Park

Nikolay Malkin

Yoshua Bengio

Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnorma… (see more)lized density where exact sampling is intractable. When sampling is implemented as a sequential decision-making process, reinforcement learning (RL) methods, such as generative flow networks, can be used to train the sampling policy. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose to use an adaptive training distribution (the \teacher) to guide the training of the primary amortized sampler (the \student). The \teacher, an auxiliary behavior model, is trained to sample high-loss regions of the \student and can generalize across unexplored modes, thereby enhancing mode coverage by providing an efficient training curriculum. We validate the effectiveness of this approach in a synthetic environment designed to present an exploration challenge, two diffusion-based sampling tasks, and four biochemical discovery tasks demonstrating its ability to improve sample efficiency and mode coverage. Source code is available at https://github.com/alstn12088/adaptive-teacher.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Towards Improving Exploration Through Sibling Augmented GFlowNets

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

Investigating Generalization Behaviours of Generative Flow Networks

Lazar Atanackovic

Emmanuel Bengio

Generative Flow Networks (GFlowNets, GFNs) are a generative framework for learning unnormalized probability mass functions over discrete spa… (see more)ces. Since their inception, GFlowNets have proven to be useful for learning generative models in applications where the majority of the discrete space is unvisited during training. This has inspired some to hypothesize that GFlowNets, when paired with deep neural networks (DNNs), have favourable generalization properties. In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlowNets. In particular, we find that the functions that GFlowNets learn to approximate have an implicit underlying structure which facilitate generalization. We also find that GFlowNets are sensitive to being trained offline and off-policy; however, the reward implicitly learned by GFlowNets is robust to changes in the training distribution.

2024-12-31

Trans. Mach. Learn. Res. (published)

doi.org

arxiv.org

Efficient Biological Data Acquisition through Inference Set Design

Ihor Neporozhnii

Julien Roy

Emmanuel Bengio

Jason Hartford

In drug discovery, highly automated high-throughput laboratories are used to screen a large number of compounds in search of effective drugs… (see more). These experiments are expensive, so one might hope to reduce their cost by only experimenting on a subset of the compounds, and predicting the outcomes of the remaining experiments. In this work, we model this scenario as a sequential subset selection problem: we aim to select the smallest set of candidates in order to achieve some desired level of accuracy for the system as a whole. Our key observation is that, if there is heterogeneity in the difficulty of the prediction problem across the input space, selectively obtaining the labels for the hardest examples in the acquisition pool will leave only the relatively easy examples to remain in the inference set, leading to better overall system performance. We call this mechanism inference set design, and propose the use of a confidence-based active learning solution to prune out these challenging examples. Our algorithm includes an explicit stopping criterion that interrupts the acquisition loop when it is sufficiently confident that the system has reached the target performance. Our empirical studies on image and molecular datasets, as well as a real-world large-scale biological assay, show that active learning for inference set design leads to significant reduction in experimental cost while retaining high system performance.

2024-10-24

ArXiv (preprint)

doi.org

arxiv.org

Amortizing intractable inference in diffusion models for vision, language, and control

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (see more)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-24

Neural Information Processing Systems (poster)

doi.org

openreview.net

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Emmanuel Bengio

Biography

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Emmanuel Bengio

Biography

Publications