Emmanuel Bengio

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Minsu Kim

Alex Hern'andez-Garc'ia

Jinkyoo Park

Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high … (see more)cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce

2024-10-06

ArXiv (preprint)

Improved Off-policy Reinforcement Learning in Biological Sequence Design

Minsu Kim

Alex Hernandez-Garcia

Jinkyoo Park

Designing biological sequences with desired properties is a significant challenge due to the combinatorially vast search space and the high … (see more)cost of evaluating each candidate sequence. To address these challenges, reinforcement learning (RL) methods, such as GFlowNets, utilize proxy models for rapid reward evaluation and annotated data for policy training. Although these approaches have shown promise in generating diverse and novel sequences, the limited training data relative to the vast search space often leads to the misspecification of proxy for out-of-distribution inputs. We introduce

2024-10-06

ArXiv (preprint)

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Minsu Kim

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (see more)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

QGFN: Controllable Greediness with Action Values

Elaine Lau

Stephen Zhewen Lu

Ling Pan

Doina Precup

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating … (see more)diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate,

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

GFlowNet Pretraining with Inexpensive Rewards

Mohit Pandey

Gopeshh Subbaraj

Generative Flow Networks (GFlowNets), a class of generative models have recently emerged as a suitable framework for generating diverse and … (see more)high-quality molecular structures by learning from unnormalized reward distributions. Previous works in this direction often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using offline drug-like molecule datasets, which conditions A-GFNs on inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further our method by implementing a goal-conditioned fine-tuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on the ZINC15 offline dataset and employ robust evaluation metrics to show the effectiveness of our approach when compared to other relevant baseline methods in drug design.

2024-09-15

ArXiv (preprint)

GFlowNet Pretraining with Inexpensive Rewards

Mohit Pandey

Gopeshh Subbaraj

Generative Flow Networks (GFlowNets), a class of generative models have recently emerged as a suitable framework for generating diverse and … (see more)high-quality molecular structures by learning from unnormalized reward distributions. Previous works in this direction often restrict exploration by using predefined molecular fragments as building blocks, limiting the chemical space that can be accessed. In this work, we introduce Atomic GFlowNets (A-GFNs), a foundational generative model leveraging individual atoms as building blocks to explore drug-like chemical space more comprehensively. We propose an unsupervised pre-training approach using offline drug-like molecule datasets, which conditions A-GFNs on inexpensive yet informative molecular descriptors such as drug-likeliness, topological polar surface area, and synthetic accessibility scores. These properties serve as proxy rewards, guiding A-GFNs towards regions of chemical space that exhibit desirable pharmacological properties. We further our method by implementing a goal-conditioned fine-tuning process, which adapts A-GFNs to optimize for specific target properties. In this work, we pretrain A-GFN on the ZINC15 offline dataset and employ robust evaluation metrics to show the effectiveness of our approach when compared to other relevant baseline methods in drug design.

2024-09-15

ArXiv (preprint)

QGFN: Controllable Greediness with Action Values

Elaine Lau

Stephen Zhewen Lu

Ling Pan

Doina Precup

Generative Flow Networks (GFlowNets; GFNs) are a family of reward/energy-based generative methods for combinatorial objects, capable of gene… (see more)rating diverse and high-utility samples. However, biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate,

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)

openreview.net

Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Haoran He

Qingpeng Cai 0001

Ling Pan

The Generative Flow Network (GFlowNet) is a probabilistic framework in which an agent learns a stochastic policy and flow functions to sampl… (see more)e objects with probability proportional to an unnormalized reward function. GFlowNets share a strong connection with reinforcement learning (RL) that typically aims to maximize reward. A number of recent works explored connections between GFlowNets and maximum entropy (MaxEnt) RL, which incorporates entropy regularization into the standard RL objective. However, the relationship between GFlowNets and standard RL remains largely unexplored, despite the inherent similarities in their sequential decision-making nature. While GFlowNets can discover diverse solutions through specialized flow-matching objectives, connecting them to standard RL can simplify their implementation through well-established RL principles and also improve RL's capabilities in diverse solution discovery (a critical requirement in many real-world applications), and bridging this gap can further unlock the potential of both fields. In this paper, we bridge this gap by revealing a fundamental connection between GFlowNets and one of the most basic components of RL -- policy evaluation. Surprisingly, we find that the value function obtained from evaluating a uniform policy is closely associated with the flow functions in GFlowNets. Building upon these insights, we introduce a rectified random policy evaluation (RPE) algorithm, which achieves the same reward-matching effect as GFlowNets based on simply evaluating a fixed random policy, offering a new perspective. Empirical results across extensive benchmarks demonstrate that RPE achieves competitive results compared to previous approaches, shedding light on the previously overlooked connection between (non-MaxEnt) RL and GFlowNets.

2024-06-04

ArXiv (preprint)

Random Policy Evaluation Uncovers Policies of Generative Flow Networks

Haoran He

Qingpeng Cai 0001

Ling Pan

2024-06-04

ArXiv (preprint)

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Minsu Kim

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (see more)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-05-31

ArXiv (preprint)

Amortizing intractable inference in diffusion models for vision, language, and control

Moksh J. Jain

Minsu Kim

Diffusion models have emerged as effective distribution estimators in vision, language, and reinforcement learning, but their use as priors … (see more)in downstream tasks poses an intractable posterior inference problem. This paper studies amortized sampling of the posterior over data,

2024-05-31

ArXiv (preprint)