Publications

LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs

Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting… (see more) these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.

2025-05-01

arXiv (published)

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Sergio Arnaud

Paul McVay

Ada Martin

Arjun Majumdar

Krishna Murthy

Phillip Thomas

Ruslan Partsey

Daniel Dugas

Abha Gejji

Alexander Sax

Vincent-Pierre Berges

Mikael Henaff

Ayush Jain

Ang Cao

Ishita Prasad

Mrinal Kalakrishnan

Michael Rabbat

Nicolas Ballas

Mahmoud Assran

Oleksandr Maksymets … (see 2 more)

Aravind Rajeswaran

Franziska Meier

2025-05-01

ICML.cc/2025/Conference (poster)

Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning

Jiashun Liu

Zihao Wu

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Ling Pan

2025-05-01

arXiv (published)

Gintare Karolina Dziugaite

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

Phillip Huang Guo

Aaquib Syed

Abhay Sheshadri

Aidan Ewart

2025-05-01

ICML.cc/2025/Conference (poster)

Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning

Ghada Sokar

Pablo Samuel Castro

2025-05-01

arXiv (published)

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

Hongyao Tang

Johan Samir Obando Ceron

Pablo Samuel Castro

Aaron Courville

Glen Berseth

Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (see more)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.

2025-05-01

ICML.cc/2025/Conference (poster)

Monte Carlo Tree Diffusion for System 2 Planning

Jaesik Yoon

Hyeonseo Cho

Doojin Baek

Yoshua Bengio

Sungjin Ahn

Diffusion models have recently emerged as a powerful tool for planning. However, unlike Monte Carlo Tree Search (MCTS)-whose performance nat… (see more)urally improves with additional test-time computation (TTC), standard diffusion-based planners offer only limited avenues for TTC scalability. In this paper, we introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of MCTS. Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined. By selectively expanding promising trajectories while retaining the flexibility to revisit and improve suboptimal branches, MCTD achieves the benefits of MCTS such as controlling exploration-exploitation trade-offs within the diffusion framework. Empirical results on challenging long-horizon tasks show that MCTD outperforms diffusion baselines, yielding higher-quality solutions as TTC increases.

2025-05-01

ICML.cc/2025/Conference (poster)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Reyhane Askari Hemmat

Koustuv Sinha

Melissa Hall

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

ArXiv (preprint)

Multi-Modal Language Models as Text-to-Image Model Evaluators

Jiahui Chen

Candace Ross

Reyhane Askari Hemmat

Koustuv Sinha

Melissa Hall

Michal Drozdzal

Adriana Romero Soriano

2025-05-01

arXiv (published)

Network Sparsity Unlocks the Scaling Potential of Deep Reinforcement Learning

Guozheng Ma

Li Li

Zilin Wang

Li Shen

Pierre-Luc Bacon

Dacheng Tao

Effectively scaling up deep reinforcement learning models has proven notoriously difficult due to network pathologies during training, moti… (see more)vating various targeted interventions such as periodic reset and architectural advances such as layer normalization. Instead of pursuing more complex modifications, we show that introducing static network sparsity alone can unlock further scaling potential beyond their dense counterparts with state-of-the-art architectures. This is achieved through simple one-shot random pruning, where a predetermined percentage of network weights are randomly removed once before training. Our analysis reveals that, in contrast to naively scaling up dense DRL networks, such sparse networks achieve both higher parameter efficiency for network expressivity and stronger resistance to optimization challenges like plasticity loss and gradient interference. We further extend our evaluation to visual and streaming RL scenarios, demonstrating the consistent benefits of network sparsity.

2025-05-01

ICML.cc/2025/Conference (oral)