Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments
Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning
A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space, and embed … (see more)these learned distances in the representation space. While promising for robustness to task-irrelevant noise shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep RL, we evaluate five recent approaches. We unify them under isometric embedding, identify key design choices, and benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 250+ configurations with diverse noise settings. Beyond final returns, we introduce the denoising factor to quantify the encoder’s ability to filter distractions. To further isolate the effect of metric learning, we propose an isolated metric estimation setting, where the encoder is influenced solely by the metric loss. Our results show that metric learning improves return and denoising only marginally, as its benefits fade when key design choices, such as layer normalization and self-prediction loss, are incorporated into the baseline. We also find that commonly used benchmarks (e.g., grayscale videos, varying state-based Gaussian noise dimensions) add little difficulty, while Gaussian noise with random projection and pixel-based Gaussian noise remain challenging even for the best methods. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.
Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning
A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space, and embed … (see more)these learned distances in the representation space. While promising for robustness to task-irrelevant noise shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep RL, we evaluate five recent approaches. We unify them under isometric embedding, identify key design choices, and benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 250+ configurations with diverse noise settings. Beyond final returns, we introduce the denoising factor to quantify the encoder’s ability to filter distractions. To further isolate the effect of metric learning, we propose an isolated metric estimation setting, where the encoder is influenced solely by the metric loss. Our results show that metric learning improves return and denoising only marginally, as its benefits fade when key design choices, such as layer normalization and self-prediction loss, are incorporated into the baseline. We also find that commonly used benchmarks (e.g., grayscale videos, varying state-based Gaussian noise dimensions) add little difficulty, while Gaussian noise with random projection and pixel-based Gaussian noise remain challenging even for the best methods. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.
Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity
Dhananjay Bhaskar
Yanlei Zhang
Jessica Moore
Feng Gao
Bastian Rieck
Firas Khasawneh
Elizabeth Munch
J. Adam Noah
Helen Pushkarskaya
Christopher Pittenger
Valentina Greco
Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity
Dhananjay Bhaskar
Jessica Moore
Feng Gao
Bastian Rieck
Firas Khasawneh
Elizabeth Munch
Valentina Greco
Neural signals are high-dimensional, noisy, and dynamic, making it challenging to extract interpretable features linked to behavior or disea… (see more)se. We introduce Neurospectrum, a framework that encodes neural activity as latent trajectories shaped by spatial and temporal structure. At each timepoint, signals are represented on a graph capturing spatial relationships, with a learnable attention mechanism highlighting important regions. These are embedded using graph wavelets and passed through a manifold-regularized autoencoder that preserves temporal geometry. The resulting latent trajectory is summarized using a principled set of descriptors - including curvature, path signatures, persistent homology, and recurrent networks -that capture multiscale geometric, topological, and dynamical features. These features drive downstream prediction in a modular, interpretable, and end-to-end trainable framework. We evaluate Neurospectrum on simulated and experimental datasets. It tracks phase synchronization in Kuramoto simulations, reconstructs visual stimuli from calcium imaging, and identifies biomarkers of obsessive-compulsive disorder in fMRI. Across tasks, Neurospectrum uncovers meaningful neural dynamics and outperforms traditional analysis methods.
RL, but don't do anything I wouldn't do
Michael K. Cohen
Marcus Hutter
Stuart Russell
In reinforcement learning (RL), if the agent's reward differs from the designers' true utility, even only rarely, the state distribution res… (see more)ulting from the agent's policy can be very bad, in theory and in practice. When RL policies would devolve into undesired behavior, a common countermeasure is KL regularization to a trusted policy ("Don't do anything I wouldn't do"). All current cutting-edge language models are RL agents that are KL-regularized to a "base policy" that is purely predictive. Unfortunately, we demonstrate that when this base policy is a Bayesian predictive model of a trusted policy, the KL constraint is no longer reliable for controlling the behavior of an advanced RL agent. We demonstrate this theoretically using algorithmic information theory, and while systems today are too weak to exhibit this theorized failure precisely, we RL-finetune a language model and find evidence that our formal results are plausibly relevant in practice. We also propose a theoretical alternative that avoids this problem by replacing the "Don't do anything I wouldn't do" principle with "Don't do anything I mightn't do".
Can a Bayesian Oracle Prevent Harm from an Agent?
Is there a way to design powerful AI systems based on machine learning methods that would satisfy probabilistic safety guarantees? With the … (see more)long-term goal of obtaining a probabilistic guarantee that would apply in every context, we consider estimating a context-dependent bound on the probability of violating a given safety specification. Such a risk evaluation would need to be performed at run-time to provide a guardrail against dangerous actions of an AI. Noting that different plausible hypotheses about the world could produce very different outcomes, and because we do not know which one is right, we derive bounds on the safety violation probability predicted under the true but unknown hypothesis. Such bounds could be used to reject potentially dangerous actions. Our main results involve searching for cautious but plausible hypotheses, obtained by a maximization that involves Bayesian posteriors over hypotheses. We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such theoretical results into practical AI guardrails.
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Mock Deep Testing: Toward Separate Development of Data and Models for Deep Learning
Ruchira Manke
Mohammad Wardat
Hridesh Rajan
While deep learning (DL) has permeated, and become an integral component of many critical software systems, today software engineering resea… (see more)rch hasn't explored how to separately test data and models that are integral for DL approaches to work effectively. The main challenge in independently testing these components arises from the tight dependency between data and models. This research explores this gap, introducing our methodology of mock deep testing for unit testing of DL applications. To enable unit testing, we introduce a design paradigm that decomposes the workflow into distinct, manageable components, minimizes sequential dependencies, and modularizes key stages of the DL. For unit testing these components, we propose modeling their dependencies using mocks. This modular approach facilitates independent development and testing of the components, ensuring comprehensive quality assurance throughout the development process. We have developed KUnit, a framework for enabling mock deep testing for the Keras library. We empirically evaluated KUnit to determine the effectiveness of mocks. Our assessment of 50 DL programs obtained from Stack Overflow and GitHub shows that mocks effectively identified 10 issues in the data preparation stage and 53 issues in the model design stage. We also conducted a user study with 36 participants using KUnit to perceive the effectiveness of our approach. Participants using KUnit successfully resolved 25 issues in the data preparation stage and 38 issues in the model design stage. Our findings highlight that mock objects provide a lightweight emulation of the dependencies for unit testing, facilitating early bug detection. Lastly, to evaluate the usability of KUnit, we conducted a post-study survey. The results reveal that KUnit is helpful to DL application developers, enabling them to independently test each component effectively in different stages.
Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions
Anthonia Njoku
Heng Li
Position: Humanity Faces Existential Risk from Gradual Disempowerment
Jan Kulveit
Raymond Douglas
Nora Ammann
Deger Turan
David Duvenaud
The Search for Squawk: Agile Modeling in Bioacoustics
Vincent Dumoulin
Otilia Stretcu
Jenny Hamer
Lauren Harrell
Rob Laber
Bart van Merriënboer
Amanda Navine
Patrick Hart
Ben Williams
Timothy A. C. Lamont
Tries B. Rasak
Mars Coral Restoration Team
Sheryn Brodie
Brendan Doohan
Philip Eichinski
Paul Roe
Lin Schwarzkopf
Tom Denton