Publications

Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning

Ghada Sokar

Pablo Samuel Castro

2025-05-01

arXiv (published)

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Hongyao Tang

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Glen Berseth

Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (see more)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.

2025-05-01

ICML.cc/2025/Conference (poster)

A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment

Jean-Philippe Corbeil

Amin Dada

Jean-Michel Attendu

Asma Ben Abacha

Alessandro Sordoni

Lucas Caccia

Franccois Beaulieu

Thomas Lin

Jens Kleesiek

Paul Vozila

High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (see more) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.

2025-05-01

arXiv (published)

Monitoring morphometric drift in lifelong learning segmentation of the spinal cord

Enamundram Naga Karthik

Sandrine B'edard

Jan Valovsek

Christoph Aigner

Elise Bannier

Josef Bednavr'ik

Virginie Callot

Anna Combes

Armin Curt

Gergely David

Falk Eippert

Lynn Farner

M. G. Fehlings

Patrick Freund

Tobias Granberg

Cristina Granziera

Rhscir Network Imaging Group

Ulrike Horn

Tom'avs Hor'ak

Suzanne Humphreys … (see 36 more)

Markus Hupp

Anne Kerbrat

Nawal Kinany

Shannon Kolind

Petr Kudlivcka

Anna Lebret

L. Lee

Caterina Mainero

Allan R. Martin

Megan McGrath

Govind Nair

Kristin P. O’Grady

Jiwon Oh

Russell Ouellette

Nikolai Pfender

Dario Pfyffer

P. Pradat

Alexandre Prat

Emanuele Pravatà

D. S. Reich

Ilaria Ricchi

Naama Rotem-Kohavi

Simon Schading-Sassenhausen

Maryam Seif

Andrew C. Smith

Seth Aaron Smith

Grace Sweeney

Roger Tam

Anthony Traboulsee

Constantina A. Treaba

Charidimos Tsagkas

Zachary Vavasour

Dimitri Van De Ville

Kenneth A. Weber

Sarath Chandar

Julien Cohen-Adad

2025-05-01

arXiv (published)

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon

Hyeonseo Cho

Doojin Baek

Yoshua Bengio

Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance nat… (see more)urally improves with additional test-time computation (TTC), standard diffusion-based planners offer only limited avenues for TTC scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as TTC increases.

2025-05-01

ICML.cc/2025/Conference (poster)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Reyhane Askari Hemmat

Koustuv Sinha

Melissa Hall

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

arXiv (published)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Reyhane Askari Hemmat

Koustuv Sinha

Melissa Hall

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

ArXiv (preprint)

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma

Lu Li

Zilin Wang

Li Shen

Pierre-Luc Bacon

Dacheng Tao

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, moti… (see more)vating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

2025-05-01

ICML.cc/2025/Conference (oral)

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Siddarth Venkatraman

Mohsin Hasan

Minsu Kim

Luca Scimeca

Marcin Sendera

Yoshua Bengio

Glen Berseth

Nikolay Malkin

Any well-behaved generative model over a variable …

2025-05-01

ICML.cc/2025/Conference (poster)

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

2025-05-01

arXiv (published)

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Tingchen Fu

Mrinank Sharma

Philip Torr

Shay B. Cohen

David Scott Krueger

Fazl Barez

Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To addre… (see more)ss this concern, we introduce PoisonBench, a benchmark for evaluating large language models' susceptibility to data poisoning during preference learning. Data poisoning attacks can manipulate large language model responses to include hidden malicious content or biases, potentially causing the model to generate harmful or unintended outputs while appearing to function normally. We deploy two distinct attack types across eight realistic scenarios, assessing 22 widely-used models. Our findings reveal concerning trends: (1) Scaling up parameter size does not always enhance resilience against poisoning attacks and the influence on model resilience varies among different model suites. (2) There exists a log-linear relationship between the effects of the attack and the data poison ratio; (3) The effect of data poisoning can generalize to extrapolated triggers that are not included in the poisoned data. These results expose weaknesses in current preference learning techniques, highlighting the urgent need for more robust defenses against malicious models and data manipulation.

2025-05-01

ICML.cc/2025/Conference (poster)