Publications

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Hongyao Tang

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Glen Berseth

Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (voir plus)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.

2025-05-01

ICML.cc/2025/Conference (poster)

A Modular Approach for Clinical SLMs Driven by Synthetic Data with Pre-Instruction Tuning, Model Merging, and Clinical-Tasks Alignment

Jean-Philippe Corbeil

Amin Dada

Jean-Michel Attendu

Asma Ben Abacha

Alessandro Sordoni

Lucas Caccia

Franccois Beaulieu

Thomas Lin

Jens Kleesiek

Paul Vozila

High computation costs and latency of large language models such as GPT-4 have limited their deployment in clinical settings. Small language… (voir plus) models (SLMs) offer a cost-effective alternative, but their limited capacity requires biomedical domain adaptation, which remains challenging. An additional bottleneck is the unavailability and high sensitivity of clinical data. To address these challenges, we propose a novel framework for adapting SLMs into high-performing clinical models. We introduce the MediPhi collection of 3.8B-parameter SLMs developed with our novel framework: pre-instruction tuning of experts on relevant medical and clinical corpora (PMC, Medical Guideline, MedWiki, etc.), model merging, and clinical-tasks alignment. To cover most clinical tasks, we extended the CLUE benchmark to CLUE+, doubling its size. Our expert models deliver relative improvements on this benchmark over the base model without any task-specific fine-tuning: 64.3% on medical entities, 49.5% on radiology reports, and 44% on ICD-10 coding (outperforming GPT-4-0125 by 14%). We unify the expert models into MediPhi via model merging, preserving gains across benchmarks. Furthermore, we built the MediFlow collection, a synthetic dataset of 2.5 million high-quality instructions on 14 medical NLP tasks, 98 fine-grained document types, and JSON format support. Alignment of MediPhi using supervised fine-tuning and direct preference optimization achieves further gains of 18.9% on average.

2025-05-01

arXiv (publié)

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon

Hyeonseo Cho

Doojin Baek

Yoshua Bengio

Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance nat… (voir plus)urally improves with additional test-time computation (TTC), standard diffusion-based planners offer only limited avenues for TTC scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as TTC increases.

2025-05-01

ICML.cc/2025/Conference (poster)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Reyhane Askari Hemmat

Koustuv Sinha

Melissa Hall

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

arXiv (publié)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Reyhane Askari Hemmat

Koustuv Sinha

Melissa Hall

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

ArXiv (prépublication)

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma

Li Li

Zilin Wang

Li Shen

Pierre-Luc Bacon

Dacheng Tao

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, moti… (voir plus)vating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

2025-05-01

ICML.cc/2025/Conference (présentation orale)

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Minsu Kim

Any well-behaved generative model over a variable …

2025-05-01

ICML.cc/2025/Conference (poster)

Plasticity as the Mirror of Empowerment

David Abel

Michael Bowling

Andre Barreto

Will Dabney

Shi Dong

Steven Hansen

Anna Harutyunyan

Khimya Khetarpal

Clare Lyle

Razvan Pascanu

Georgios Piliouras

Doina Precup

Jonathan Richens

Mark Rowland

Tom Schaul

Satinder Singh

2025-05-01

arXiv (publié)

PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data

Tingchen Fu

Mrinank Sharma

Philip Torr

Shay B. Cohen

David Scott Krueger

Fazl Barez

Preference learning is a central component for aligning current LLMs, but this process can be vulnerable to data poisoning attacks. To addre… (voir plus)ss this concern, we introduce PoisonBench, a benchmark for evaluating large language models' susceptibility to data poisoning during preference learning. Data poisoning attacks can manipulate large language model responses to include hidden malicious content or biases, potentially causing the model to generate harmful or unintended outputs while appearing to function normally. We deploy two distinct attack types across eight realistic scenarios, assessing 22 widely-used models. Our findings reveal concerning trends: (1) Scaling up parameter size does not always enhance resilience against poisoning attacks and the influence on model resilience varies among different model suites. (2) There exists a log-linear relationship between the effects of the attack and the data poison ratio; (3) The effect of data poisoning can generalize to extrapolated triggers that are not included in the poisoned data. These results expose weaknesses in current preference learning techniques, highlighting the urgent need for more robust defenses against malicious models and data manipulation.

2025-05-01

ICML.cc/2025/Conference (poster)

Position: Probabilistic Modelling is Sufficient for Causal Inference

Bruno Mlodozeniec

David Scott Krueger

Richard E. Turner

2025-05-01

ICML.cc/2025/Position_Paper_Track (présentation orale)

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Mouad Abrini

Omri Abend

Dina M. Acklin

Henny Admoni

Gregor Aichinger

Nitay Alon

Zahra Ashktorab

Ashish Atreja

Moises Auron

Alexander Aufreiter

Raghav Awasthi

Soumya Banerjee

Joseph Barnby

Rhea Basappa

Severin Bergsmann

Djallel Bouneffouf

Patrick Callaghan

Marc Cavazza

Thierry Chaminade

Sonia Chernova … (voir 88 de plus)

Mohamed Chetouan

Moumita Choudhury

Axel Cleeremans

J. Cywinski

Fabio Cuzzolin

Hokin Deng

N'yoma Diamond

C. D. Pasquasio

Guillaume Dumas

Max J. van Duijn

Mahapatra Dwarikanath

Qingying Gao

Ashok Goel

Rebecca R. Goldstein

Matthew C. Gombolay

Gabriel Enrique Gonzalez

Amar Halilovic

Tobias Halmdienst

Mahimul Islam

Julian Jara-Ettinger

Natalie Kastel

Renana Keydar

Ashish K. Khanna

Mahdi Khoramshahi

Jihyun Kim

Mihyeon Kim

Youngbin Kim

Senka Krivic

Nikita Krasnytskyi

Arun Kumar

Junehyoung Kwon

EunJu Lee

Shane Lee

Peter R. Lewis 0001

Xue Li

Yijiang Li

Michal Lewandowski

Nathan Lloyd

Matthew B. Luebbers

Dezhi Luo

Haiyun Lyu

Dwarikanath Mahapatra

Kamal Maheshwari

Mallika Mainali

P. Mathur

Patrick Mederitsch

Shuwa Miura

Manuel Preston de Miranda

Reuth Mirsky

Shreya Mishra

Nina M. Moorman

Katelyn Morrison

John Muchovej

Bernhard Nessler

Felix Nessler

Hieu Minh Jord Nguyen

Abby Ortego

F. Papay

Antoine Pasquali

Hamed Rahimi

C. Raghu

Amanda L. Royka

Stefan Sarkadi

Jaelle Scheuerman

Simon Schmid

Paul Schrater

Anik Sen

Zahra Sheikhbahaee

Ke Shi

Reid G. Simmons

Nishant Singh

Mason O. Smith

Ramira van der Meulen

Anthia Solaki

Haoran Sun

Viktor Szolga

Matthew E. Taylor

Travis Taylor

Sanne van Waveren

Juan David Vargas

R. Verbrugge

Eitan Wagner

Justin D. Weisz

Ximing Wen

William Yeoh

Wenlong Zhang

Michelle Zhao

Shlomo Zilberstein

2025-05-01

arXiv (publié)