Siddarth Venkatraman

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Any well-behaved generative model over a variable …

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

doi.org

openreview.net

Outsourced Diffusion Sampling: Efficient Posterior Inference in Latent Spaces of Generative Models

Any well-behaved generative model over a variable …

2025-10-06

Proceedings of the 42nd International Conference on Machine Learning (publié)

proceedings.mlr.press

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Yoshua Bengio

Brian R. Bartoldson

Bhavya Kailkhura

Guillaume Lajoie

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-30

ArXiv (prépublication)

arxiv.org

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Yoshua Bengio

Brian R. Bartoldson

Bhavya Kailkhura

Guillaume Lajoie

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-30

ArXiv (prépublication)

arxiv.org

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Yoshua Bengio

Brian R. Bartoldson

Bhavya Kailkhura

Guillaume Lajoie

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-30

ArXiv (prépublication)

arxiv.org

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models

Johan Samir Obando Ceron

Yoshua Bengio

Brian R. Bartoldson

Bhavya Kailkhura

Guillaume Lajoie

Glen Berseth

Nikolay Malkin

Moksh J. Jain

Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference… (voir plus) to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple independent solutions or sequentially through self-refinement. We propose Recursive Self-Aggregation (RSA), a test-time scaling method inspired by evolutionary methods that combines the benefits of both parallel and sequential scaling. Each step of RSA refines a population of candidate reasoning chains through aggregation of subsets to yield a population of improved solutions, which are then used as the candidate pool for the next iteration. RSA exploits the rich information embedded in the reasoning chains -- not just the final answers -- and enables bootstrapping from partially correct intermediate steps within different chains of thought. Empirically, RSA delivers substantial performance gains with increasing compute budgets across diverse tasks, model families and sizes. Notably, RSA enables Qwen3-4B-Instruct-2507 to achieve competitive performance with larger reasoning models, including DeepSeek-R1 and o3-mini (high), while outperforming purely parallel and sequential scaling strategies across AIME-25, HMMT-25, Reasoning Gym, LiveCodeBench-v6, and SuperGPQA. We further demonstrate that training the model to combine solutions via a novel aggregation-aware reinforcement learning approach yields significant performance gains. Code available at https://github.com/HyperPotatoNeo/RSA.

2025-09-30

ArXiv (prépublication)

doi.org

arxiv.org

Solving Bayesian inverse problems with diffusion priors and off-policy RL

Moksh J. Jain

Laurence Perreault-Levasseur

Yoshua Bengio

Glen Berseth

Nikolay Malkin

This paper presents a practical application of Relative Trajectory Balance (RTB), a recently introduced off-policy reinforcement learning (R… (voir plus)L) objective that can asymptotically solve Bayesian inverse problems optimally. We extend the original work by using RTB to train conditional diffusion model posteriors from pretrained unconditional priors for challenging linear and non-linear inverse problems in vision, and science. We use the objective alongside techniques such as off-policy backtracking exploration to improve training. Importantly, our results show that existing training-free diffusion posterior methods struggle to perform effective posterior inference in latent space due to inherent biases.

2025-03-06

ICLR.cc/2025/Workshop/DeLTa (poster)

doi.org

openreview.net

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models