Publications

Efficient Regression-Based Training of Normalizing Flows for Boltzmann Generators

Oscar Davis

Michael Bronstein

Avishek Joey Bose

Simulation-free training frameworks have been at the forefront of the generative modelling revolution in continuous spaces, leading to large… (see more)-scale diffusion and flow matching models. However, such modern generative models suffer from expensive inference, inhibiting their use in numerous scientific applications like Boltzmann Generators (BGs) for molecular conformations that require fast likelihood evaluation. In this paper, we revisit classical normalizing flows in the context of BGs that offer efficient sampling and likelihoods, but whose training via maximum likelihood is often unstable and computationally challenging. We propose Regression Training of Normalizing Flows (RegFlow), a novel and scalable regression-based training objective that bypasses the numerical instability and computational challenge of conventional maximum likelihood training in favour of a simple

2025-06-10

ICML.cc/2025/Workshop/GenBio (spotlight)

doi.org

openreview.net

Fast Monte Carlo Tree Diffusion: 100x Speedup via Parallel Sparse Planning

Jaesik Yoon

Hyeonseo Cho

Yoshua Bengio

Sungjin Ahn

Diffusion models have recently emerged as a powerful approach for trajectory planning. However, their inherently non-sequential nature limit… (see more)s their effectiveness in long-horizon reasoning tasks at test time. The recently proposed Monte Carlo Tree Diffusion (MCTD) offers a promising solution by combining diffusion with tree-based search, achieving state-of-the-art performance on complex planning problems. Despite its strengths, our analysis shows that MCTD incurs substantial computational overhead due to the sequential nature of tree search and the cost of iterative denoising. To address this, we propose Fast-MCTD, a more efficient variant that preserves the strengths of MCTD while significantly improving its speed and scalability. Fast-MCTD integrates two techniques: Parallel MCTD, which enables parallel rollouts via delayed tree updates and redundancy-aware selection; and Sparse MCTD, which reduces rollout length through trajectory coarsening. Experiments show that Fast-MCTD achieves up to 100x speedup over standard MCTD while maintaining or improving planning performance. Remarkably, it even outperforms Diffuser in inference speed on some tasks, despite Diffuser requiring no search and yielding weaker solutions. These results position Fast-MCTD as a practical and scalable solution for diffusion-based inference-time reasoning.

2025-06-10

ArXiv (preprint)

doi.org

arxiv.org

IntPhys 2: Benchmarking Intuitive Physics Understanding In Complex Synthetic Environments

Florian Bordes

Quentin Garrido

Justine T Kao

Adina Williams

Michael G. Rabbat

Emmanuel Dupoux

We present IntPhys 2, a video benchmark designed to evaluate the intuitive physics understanding of deep learning models. Building on the or… (see more)iginal IntPhys benchmark, IntPhys 2 focuses on four core principles related to macroscopic objects: Permanence, Immutability, Spatio-Temporal Continuity, and Solidity. These conditions are inspired by research into intuitive physical understanding emerging during early childhood. IntPhys 2 offers a comprehensive suite of tests, based on the violation of expectation framework, that challenge models to differentiate between possible and impossible events within controlled and diverse virtual environments. Alongside the benchmark, we provide performance evaluations of several state-of-the-art models. Our findings indicate that while these models demonstrate basic visual understanding, they face significant challenges in grasping intuitive physics across the four principles in complex scenes, with most models performing at chance levels (50%), in stark contrast to human performance, which achieves near-perfect accuracy. This underscores the gap between current models and human-like intuitive physics understanding, highlighting the need for advancements in model architectures and training methodologies.

2025-06-10

ArXiv (preprint)

doi.org

openreview.net

Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation

Zahra Tehrani Nasab

Amar Kumar

Tal Arbel

2025-06-10

2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (published)

doi.org

arxiv.org

Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

Leonid Sigal

Instruction tuning has been central to the success of recent vision-language models (VLMs), but it remains expensive-requiring large-scale d… (see more)atasets, high-quality annotations, and large compute budgets. We propose PRioritized cOncept learninG via Relative Error-driven Sample Selection (PROGRESS), a data- and compute-efficient framework that enables VLMs to dynamically select what to learn next based on their evolving needs during training. At each stage, the model tracks its learning progress across skills and selects the most informative samples-those it has not already mastered and that are not too difficult to learn at the current stage of training. This strategy effectively controls skill acquisition and the order in which skills are learned. Specifically, we sample from skills showing the highest learning progress, prioritizing those with the most rapid improvement. Unlike prior methods, PROGRESS requires no upfront answer annotations, queries answers only on a need basis, avoids reliance on additional supervision from auxiliary VLMs, and does not require compute-heavy gradient computations for data selection. Experiments across multiple instruction-tuning datasets of varying scales demonstrate that PROGRESS consistently outperforms state-of-the-art baselines with much less data and supervision. Additionally, we show strong cross-architecture generalization and transferability to larger models, validating PROGRESS as a scalable solution for efficient learning.

2025-06-10

ES-FoMo @ International Conference on Machine Learning (published)

doi.org

openreview.net

LLMs for Experiment Design in Scientific Domains: Are We There Yet?

Rushil Gupta

Jason Hartford

Bang Liu

2025-06-10

ICML.cc/2025/Workshop/GenBio (poster)

openreview.net

Mapping Delayed Canopy Loss and Durable Fire Refugia for the 2020 Wildfires in Washington State Using Multiple Sensors

Anika M. Anderson

Meg A. Krawchuk

Flavie Pelletier

Jeffrey A. Cardille

2025-06-10

Fire (published)

doi.org

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Thinking

Sangmin Bae

Yujin Kim

Reza Bayat

Sungnyun Kim

Jiyoun Ha

Tal Schuster

Adam Fisch

Hrayr Harutyunyan

Ziwei Ji

Aaron Courville

Se-Young Yun

Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deploy… (see more)ment expensive. Existing efficiency efforts typically target either parameter sharing or adaptive computation, leaving open the question of how to attain both simultaneously. We introduce Mixture-of-Recursions (MoR), a unified framework that combines the two axes of efficiency inside a single Recursive Transformer. MoR reuses a shared stack of layers across recursion steps to achieve parameter efficiency, while lightweight routers enable adaptive token-level thinking by dynamically assign recursion depth to tokens, thereby focusing quadratic attention computation only where it is most useful. Further enhancing its efficiency, MoR incorporates a recursion-wise key-value caching mechanism that eliminates redundant memory access across recursion steps by selectively storing only the key-value caches for designated tokens. Across pretraining runs at model scales ranging from 135M to 1.7B parameters, MoR forms a new Pareto frontier: at equal training FLOPs and smaller model sizes, it significantly lowers validation perplexity and improves few-shot accuracy, while delivering higher throughput compared with vanilla and existing recursive baselines. These gains demonstrate that MoR is an effective path towards large-model quality without incurring large-model cost.

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

openreview.net

Model Parallelism With Subnetwork Data Parallelism

Distributed pre-training of large models at scale often imposes heavy memory demands on individual nodes and incurs significant intra-node c… (see more)ommunication costs. We propose a novel alternative approach that reduces the memory requirements by training small, structured subnetworks of the model on separate workers. Unlike pipelining, our method avoids inter-node activation communication and maintains bandwidth requirements that are comparable to or lower than standard data parallel communication schemes based on all-reduce. We evaluate two subnetwork construction strategies guided by the principle of ensuring uniform representation of each parameter across the distributed training setup. Our results show that the stochastic block dropping technique consistently outperforms the width-wise subnetwork construction previously explored in federated learning. We empirically attribute this superior performance to stronger gradient alignment in subnetworks that retain blocks having skip connections. Preliminary experiments highlight the promise of our approach, achieving a

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

doi.org

openreview.net

MuLoCo: Muon is a practical inner optimizer for DiLoCo

Benjamin Therien

Xiaolong Huang

Irina Rish

Eugene Belilovsky

2025-06-10

ICML.cc/2025/Workshop/ES-FoMo-III (published)

doi.org