Publications

AssembleFlow: Rigid Flow Matching with Inertial Frames for Molecular Assembly

Hongyu Guo

Yoshua Bengio

Shengchao Liu

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Shengyi Huang

The dominant paradigm for RLHF is online and on-policy RL: synchronously generating from the large language model (LLM) policy, labelling wi… (see more)th a reward model, and learning using feedback on the LLM's own outputs. While performant, this paradigm is computationally inefficient. Inspired by classical deep RL literature, we propose separating generation and learning in RLHF. This enables asynchronous generation of new samples while simultaneously training on old samples, leading to faster training and more compute-optimal scaling. However, asynchronous training relies on an underexplored regime, online but off-policy RLHF: learning on samples from previous iterations of our model which give a worse training signal. We tackle the fundamental challenge in this regime: how much off-policyness can we tolerate for asynchronous training to speed up learning but maintain performance? Among several RLHF algorithms we test, online DPO is found to be most robust to off-policy data, and robustness increases with the scale of the policy model. We study further compute optimizations for asynchronous RLHF but find that they come at a performance cost, giving rise to a trade-off. We verify the scalability of asynchronous RLHF by training a general-purpose chatbot from LLaMA 3.1 8B on an instruction-following task ~40% faster than a synchronous run while matching final performance. Finally, we extend our results to math and reasoning to demonstrate asynchronous RL can finetune Rho 1B on GSM8k ~70% faster while matching synchronous accuracy.

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

Beyond FVD: An Enhanced Metric for Evaluating Video Generation Distribution Quality

Ge Ya Luo

Gian Mario Favero

Zhi Hao Luo

Alexia Jolicoeur-Martineau

Christopher Pal

The Fréchet Video Distance (FVD) is a widely adopted metric for evaluating video generation distribution quality. However, its effectivenes… (see more)s relies on critical assumptions. Our analysis reveals three significant limitations: (1) the non-Gaussianity of the Inflated 3D Convnet (I3D) feature space; (2) the insensitivity of I3D features to temporal distortions; (3) the impractical sample sizes required for reliable estimation. These findings undermine FVD's reliability and show that FVD falls short as a standalone metric for video generation evaluation. After extensive analysis of a wide range of metrics and backbone architectures, we propose JEDi, the JEPA Embedding Distance, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Our experiments on multiple open-source datasets show clear evidence that it is a superior alternative to the widely used FVD metric, requiring only 16% of the samples to reach its steady value, while increasing alignment with human evaluation by 34%, on average.Project page: https://oooolga.github.io/JEDi.github.io/.

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

BigDocs: An Open Dataset for Training Multi-modal Models on Document and Code Tasks

Juan Rodriguez

Xiangru Jian

Siba Smarak Panigrahi

Akshay Kalkunte

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (see 23 more)

Sara Shanian

Ying Zhang

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Nicolas Chapados

Sepideh Kharagani

Sean Hughes

M. Özsu

Christopher Pal

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (see more) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

Boosting Latent Diffusion with Perceptual Objectives

Tariq Berrada Ifriqi

Pietro Astolfi

Jakob Verbeek

Melissa Hall

Marton Havasi

Michal Drozdzal

Yohann Benchetrit

Adriana Romero-Soriano

Karteek Alahari

Latent diffusion models (LDMs) power state-of-the-art high-resolution generative image models. LDMs learn the data distribution in the laten… (see more)t space of an autoencoder (AE) and produce images by mapping the generated latents into RGB image space using the AE decoder. While this approach allows for efficient model training and sampling, it induces a disconnect between the training of the diffusion model and the decoder, resulting in a loss of detail in the generated images. To remediate this disconnect, we propose to leverage the internal features of the decoder to define a latent perceptual loss (LPL). This loss encourages the models to create sharper and more realistic images. Our loss can be seamlessly integrated with common autoencoders used in latent diffusion models, and can be applied to different generative modeling paradigms such as DDPM with epsilon and velocity prediction, as well as flow matching. Extensive experiments with models trained on three datasets at 256 and 512 resolution show improved quantitative -- with boosts between 6% and 20% in FID -- and qualitative results when using our perceptual loss.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

CarbonSense: A Multimodal Dataset and Baseline for Carbon Flux Modelling

Matthew Fortier

Mats L. Richter

Oliver Sonnentag

Chris Pal

Terrestrial carbon fluxes provide vital information about our biosphere's health and its capacity to absorb anthropogenic CO…

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Celo: Training Versatile Learned Optimizers on a Compute Diet

Learned optimization has emerged as a promising alternative to hand-crafted optimizers, with the potential to discover stronger learned upda… (see more)te rules that enable faster, hyperparameter-free training of neural networks. A critical element for practically useful learned optimizers, that can be used off-the-shelf after meta-training, is strong meta-generalization: the ability to apply the optimizers to new tasks. Recent state-of-the-art work in learned optimizers, VeLO (Metz et al., 2022), requires a large number of highly diverse meta-training tasks along with massive computational resources, 4000 TPU months, to achieve meta-generalization. This makes further improvements to such learned optimizers impractical. In this work, we identify several key elements in learned optimizer architectures and meta-training procedures that can lead to strong meta-generalization. We also propose evaluation metrics to reliably assess quantitative performance of an optimizer at scale on a set of evaluation tasks. Our proposed approach, Celo, makes a significant leap in improving the meta-generalization performance of learned optimizers and also outperforms tuned state-of-the-art optimizers on a diverse set of out-of-distribution tasks, despite being meta-trained for just 24 GPU hours.

2025-01-21

ArXiv (preprint)

doi.org

openreview.net

Credit-Based Self Organizing Maps: Training Deep Topographic Networks with Minimal Performance Degradation

Amir Ozhan Dehghani

Xinyu Qian

Asa Farahani

Pouya Bashivan

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

Don't Flatten, Tokenize! Unlocking the Key to SoftMoE's Efficacy in Deep RL

Ghada Sokar

Johan Obando-Ceron

Aaron Courville

Hugo Larochelle

Pablo Samuel Castro

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While sof… (see more)t mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.

2025-01-21

ICLR.cc/2025/Conference (spotlight)

doi.org

openreview.net

Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces

DiJia Su

Sainbayar Sukhbaatar

Michael G. Rabbat

Yuandong Tian

Qinqing Zheng

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets

Zhen Liu

Tim Z. Xiao

Weiyang Liu

Yoshua Bengio

Dinghuai Zhang

While one commonly trains large diffusion models by collecting datasets on target downstream tasks, it is often desired to align and finetun… (see more)e pretrained diffusion models with some reward functions that are either designed by experts or learned from small-scale datasets. Existing post-training methods for reward finetuning of diffusion models typically suffer from lack of diversity in generated samples, lack of prior preservation, and/or slow convergence in finetuning. In response to this challenge, we take inspiration from recent successes in generative flow networks (GFlowNets) and propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet (abbreviated as

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference

Matthew D Riemer

Gopeshh Subbaraj

Glen Berseth

Irina Rish

Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (see more)y minimize regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime reinforcement learning (RL) environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokémon and Tetris.

2025-01-21

ICLR.cc/2025/Conference (poster)

doi.org

openreview.net

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Publications