Publications

Optimal discounting for offline input-driven MDP

Randy Lefebvre

Offline reinforcement learning has gained a lot of popularity for its potential to solve industry challenges. However, real-world environmen… (see more)ts are often highly stochastic and partially observable, leading long-term planners to overfit to offline data in model-based settings. Input-driven Markov Decision Processes (IDMDPs) offer a way to work with some of the uncertainty by letting designers separate what the agent has control over (states) from what it cannot (inputs) in the environnement. These stochastic external inputs are often difficult to model. Under the assumption that the input model will be imperfect, we investigate the bias-variance tradeoff under shallow planning in IDMDPs. Paving the way to input-driven planning horizons, we also investigate the similarity of optimal planning horizons at different inputs given the structure of the input space.

2025-05-09

rl-conference.cc/RLC/2025/Conference (published)

openreview.net

Optimal discounting for offline input-driven MDP

Randy Lefebvre

Audrey Durand

Offline reinforcement learning has gained a lot of popularity for its potential to solve industry challenges. However, real-world environmen… (see more)ts are often highly stochastic and partially observable, leading long-term planners to overfit to offline data in model-based settings. Input-driven Markov Decision Processes (IDMDPs) offer a way to work with some of the uncertainty by letting designers separate what the agent has control over (states) from what it cannot (inputs) in the environnement. These stochastic external inputs are often difficult to model. Under the assumption that the input model will be imperfect, we investigate the bias-variance tradeoff under shallow planning in IDMDPs. Paving the way to input-driven planning horizons, we also investigate the similarity of optimal planning horizons at different inputs given the structure of the input space.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepted)

openreview.net

Optimistic critics can empower small actors

Olya Mastikhina

Dhruv Sreenivas

Pablo Samuel Castro

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (see more)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.

2025-05-09

rl-conference.cc/RLC/2025/Conference (published)

openreview.net

Optimistic critics can empower small actors

Olya Mastikhina

Dhruv Sreenivas

Pablo Samuel Castro

Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (see more)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepted)

openreview.net

Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning

Ziyan Luo

Tianwei Ni

Pierre-Luc Bacon

Doina Precup

Xujie Si

A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space, and embed … (see more)these learned distances in the representation space. While promising for robustness to task-irrelevant noise shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep RL, we evaluate five recent approaches. We unify them under isometric embedding, identify key design choices, and benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 250+ configurations with diverse noise settings. Beyond final returns, we introduce the denoising factor to quantify the encoder’s ability to filter distractions. To further isolate the effect of metric learning, we propose an isolated metric estimation setting, where the encoder is influenced solely by the metric loss. Our results show that metric learning improves return and denoising only marginally, as its benefits fade when key design choices, such as layer normalization and self-prediction loss, are incorporated into the baseline. We also find that commonly used benchmarks (e.g., grayscale videos, varying state-based Gaussian noise dimensions) add little difficulty, while Gaussian noise with random projection and pixel-based Gaussian noise remain challenging even for the best methods. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.

2025-05-09

rl-conference.cc/RLC/2025/Conference (accepted)

openreview.net

Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning

Ziyan Luo

Tianwei Ni

Pierre-Luc Bacon

Doina Precup

Xujie Si

A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space, and embed … (see more)these learned distances in the representation space. While promising for robustness to task-irrelevant noise shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep RL, we evaluate five recent approaches. We unify them under isometric embedding, identify key design choices, and benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 250+ configurations with diverse noise settings. Beyond final returns, we introduce the denoising factor to quantify the encoder’s ability to filter distractions. To further isolate the effect of metric learning, we propose an isolated metric estimation setting, where the encoder is influenced solely by the metric loss. Our results show that metric learning improves return and denoising only marginally, as its benefits fade when key design choices, such as layer normalization and self-prediction loss, are incorporated into the baseline. We also find that commonly used benchmarks (e.g., grayscale videos, varying state-based Gaussian noise dimensions) add little difficulty, while Gaussian noise with random projection and pixel-based Gaussian noise remain challenging even for the best methods. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.

2025-05-09

rl-conference.cc/RLC/2025/Conference (published)

openreview.net

Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity

Dhananjay Bhaskar

Jessica Moore

Feng Gao

Bastian Rieck

Firas Khasawneh

Elizabeth Munch

Valentina Greco

Smita Krishnaswamy

Neural signals are high-dimensional, noisy, and dynamic, making it challenging to extract interpretable features linked to behavior or disea… (see more)se. We introduce Neurospectrum, a framework that encodes neural activity as latent trajectories shaped by spatial and temporal structure. At each timepoint, signals are represented on a graph capturing spatial relationships, with a learnable attention mechanism highlighting important regions. These are embedded using graph wavelets and passed through a manifold-regularized autoencoder that preserves temporal geometry. The resulting latent trajectory is summarized using a principled set of descriptors - including curvature, path signatures, persistent homology, and recurrent networks -that capture multiscale geometric, topological, and dynamical features. These features drive downstream prediction in a modular, interpretable, and end-to-end trainable framework. We evaluate Neurospectrum on simulated and experimental datasets. It tracks phase synchronization in Kuramoto simulations, reconstructs visual stimuli from calcium imaging, and identifies biomarkers of obsessive-compulsive disorder in fMRI. Across tasks, Neurospectrum uncovers meaningful neural dynamics and outperforms traditional analysis methods.

2025-05-08

bioRxiv (preprint)

doi.org

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Kusha Sareen

Morgane M Moss

Alessandro Sordoni

Rishabh Agarwal

Arian Hosseini

2025-05-07

ArXiv (preprint)

arxiv.org

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

Kusha Sareen

Morgane M Moss

Alessandro Sordoni

Rishabh Agarwal

Arian Hosseini

2025-05-07

ArXiv (preprint)

arxiv.org

Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions

Anthonia Njoku

Heng Li

Foutse Khomh

2025-05-05

Companion of the 16th ACM/SPEC International Conference on Performance Engineering (published)

doi.org

The Search for Squawk: Agile Modeling in Bioacoustics

Vincent Dumoulin

Otilia Stretcu

Jenny Hamer

Lauren Harrell

Rob Laber

Hugo Larochelle

Bart van Merriënboer

Amanda Navine

Patrick Hart

Ben Williams

Timothy A. C. Lamont

Tries B. Rasak

Mars Coral Restoration Team

Sheryn Brodie

Brendan Doohan

Philip Eichinski

Paul Roe

Lin Schwarzkopf

Tom Denton

2025-05-05

ArXiv (preprint)

arxiv.org

The Search for Squawk: Agile Modeling in Bioacoustics

Vincent Dumoulin

Otilia Stretcu

Jenny Hamer

Lauren Harrell

Rob Laber

Hugo Larochelle

Bart van Merriënboer

Amanda Navine

Patrick Hart

Ben Williams

Timothy A. C. Lamont

Tries B. Rasak

Mars Coral Restoration Team

Sheryn Brodie

Brendan Doohan

Philip Eichinski

Paul Roe

Lin Schwarzkopf

Tom Denton

2025-05-05

ArXiv (preprint)

arxiv.org

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Publications

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Popular keywords:

Publications