Marco Jiralerspong

Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators

Bilun Sun

A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a smal… (see more)l set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regularization to generate more diverse candidates. These reward functions are inherently uncertain, raising a particularly salient challenge for scientific discovery. In this work, we show that existing methods, often framed as sampling proportional to a reward function, are inadequate and yield suboptimal candidates, especially in large search spaces. To remedy this issue, we take a robust RL approach and introduce a unified operator that seeks robustness to the uncertainty of the proxy reward function. This general operator targets peakier sampling distributions while encompassing known soft RL operators. It also leads us to a novel algorithm that identifies higher-quality, diverse candidates in both synthetic and real-world tasks. Ultimately, our work offers a new, flexible perspective on discrete compositional generation tasks. Code: https://github.com/marcojira/tgm.

2025-06-20

ArXiv (preprint)

Robust Reinforcement Learning for Discrete Compositional Generation via General Soft Operators

Bilun Sun

A major bottleneck in scientific discovery involves narrowing a large combinatorial set of objects, such as proteins or molecules, to a smal… (see more)l set of promising candidates. While this process largely relies on expert knowledge, recent methods leverage reinforcement learning (RL) to enhance this filtering. They achieve this by estimating proxy reward functions from available datasets and using regularization to generate more diverse candidates. These reward functions are inherently uncertain, raising a particularly salient challenge for scientific discovery. In this work, we show that existing methods, often framed as sampling proportional to a reward function, are inadequate and yield suboptimal candidates, especially in large search spaces. To remedy this issue, we take a robust RL approach and introduce a unified operator that seeks robustness to the uncertainty of the proxy reward function. This general operator targets peakier sampling distributions while encompassing known soft RL operators. It also leads us to a novel algorithm that identifies higher-quality, diverse candidates in both synthetic and real-world tasks. Ultimately, our work offers a new, flexible perspective on discrete compositional generation tasks. Code: https://github.com/marcojira/tgm.

2025-06-01

arXiv (published)

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N

Tianyu Zhang

Andrew Robert Williams

Phillip Wozny

Kai-Hendrik Cohrs

Koen Ponse

Marco Jiralerspong

Soham Rajesh Phade

Sunil Srinivasa

Li Li

Yang Zhang

Prateek Gupta

Erman Acar

Irina Rish

Yoshua Bengio

Stephan Zheng

2025-05-01

ICML.cc/2025/Conference (poster)

General Causal Imputation via Synthetic Interventions

Given two sets of elements (such as cell types and drug compounds), researchers typically only have access to a limited subset of their inte… (see more)ractions. The task of causal imputation involves using this subset to predict unobserved interactions. Squires et al. (2022) have proposed two estimators for this task based on the synthetic interventions (SI) estimator: SI-A (for actions) and SI-C (for contexts). We extend their work and introduce a novel causal imputation estimator, generalized synthetic interventions (GSI). We prove the identifiability of this estimator for data generated from a more complex latent factor model. On synthetic and real data we show empirically that it recovers or outperforms their estimators.

2024-10-30

NeurIPS.cc/2024/Workshop/CRL (poster)

General Causal Imputation via Synthetic Interventions

Given two sets of elements (such as cell types and drug compounds), researchers typically only have access to a limited subset of their inte… (see more)ractions. The task of causal imputation involves using this subset to predict unobserved interactions. Squires et al. (2022) have proposed two estimators for this task based on the synthetic interventions (SI) estimator: SI-A (for actions) and SI-C (for contexts). We extend their work and introduce a novel causal imputation estimator, generalized synthetic interventions (GSI). We prove the identifiability of this estimator for data generated from a more complex latent factor model. On synthetic and real data we show empirically that it recovers or outperforms their estimators.

2024-10-28

ArXiv (preprint)

General Causal Imputation via Synthetic Interventions

2024-10-28

ArXiv (preprint)

Expected flow networks in stochastic environments and two-player zero-sum games

Bilun Sun

2024-01-16

ICLR.cc/2024/Conference (poster)

On the Stability of Iterative Retraining of Generative Models on their own Data

Deep generative models have made tremendous progress in modeling complex data, often exhibiting generation quality that surpasses a typical … (see more)human's ability to discern the authenticity of samples. Undeniably, a key driver of this success is enabled by the massive amounts of web-scale data consumed by these models. Due to these models' striking performance and ease of availability, the web will inevitably be increasingly populated with synthetic content. Such a fact directly implies that future iterations of generative models will be trained on both clean and artificially generated data from past models. In this paper, we develop a framework to rigorously study the impact of training generative models on mixed datasets---from classical training on real data to self-consuming generative models trained on purely synthetic data. We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough and the proportion of clean training data (w.r.t. synthetic data) is large enough. We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models on CIFAR10 and FFHQ.

2024-01-16

ICLR.cc/2024/Conference (spotlight)

AI4GCC - Track 3: Consumption and the Challenges of Multi-Agent RL

Marco Jiralerspong

Gauthier Gidel

2023-08-09

ArXiv (preprint)