Evaluating machine learning-driven intrusion detection systems in IoT: Performance and energy consumption
Saeid Jamshidi
Kawser Wazed Nafi
Amin Nikanjam
STAMP: Differentiable Task and Motion Planning via Stein Variational Gradient Descent
Yewon Lee
Andrew Zou Li
Yizhou Huang
Philip Huang
Eric Heiden
Krishna Murthy
Fabian Damken
Kevin A. Smith
Fabio Ramos
Florian Shkurti
Carnegie-mellon University
M. I. O. Technology
Technische Universitat Darmstadt
Nvidia
M. University
University of Sydney
Planning for many manipulation tasks, such as using tools or assembling parts, often requires both symbolic and geometric reasoning. Task an… (see more)d Motion Planning (TAMP) algorithms typically solve these problems by conducting a tree search over high-level task sequences while checking for kinematic and dynamic feasibility. While performant, most existing algorithms are highly inefficient as their time complexity grows exponentially with the number of possible actions and objects. Additionally, they only find a single solution to problems in which many feasible plans may exist. To address these limitations, we propose a novel algorithm called Stein Task and Motion Planning (STAMP) that leverages parallelization and differentiable simulation to efficiently search for multiple diverse plans. STAMP relaxes discrete-and-continuous TAMP problems into continuous optimization problems that can be solved using variational inference. Our algorithm builds upon Stein Variational Gradient Descent, a gradient-based variational inference algorithm, and parallelized differentiable physics simulators on the GPU to efficiently obtain gradients for inference. Further, we employ imitation learning to introduce action abstractions that reduce the inference problem to lower dimensions. We demonstrate our method on two TAMP problems and empirically show that STAMP is able to: 1) produce multiple diverse plans in parallel; and 2) search for plans more efficiently compared to existing TAMP baselines.
Ctrl-V: Higher Fidelity Autonomous Vehicle Video Generation with Bounding-Box Controlled Object Motion
Ge Ya Luo
Zhi Hao Luo
Anthony Gosselin
Alexia Jolicoeur-Martineau
Efficient Morphology-Aware Policy Transfer to New Embodiments
Michael Przystupa
Hongyao Tang
Mariano Phielipp
Santiago Miret
Martin Jägersand
Matthew E. Taylor
Morphology-aware policy learning is a means of enhancing policy sample efficiency by aggregating data from multiple agents. These types of p… (see more)olicies have previously been shown to help generalize over dynamic, kinematic, and limb configuration variations between agent morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on morphologies at deployment. This limitation has ramifications in practical applications such as robotics because further data collection to perform end-to-end finetuning can be computationally expensive. In this work, we investigate combining morphology-aware pretraining with \textit{parameter efficient finetuning} (PEFT) techniques to help reduce the learnable parameters necessary to specialize a morphology-aware policy to a target embodiment. We compare directly tuning sub-sets of model weights, input learnable adapters, and prefix tuning techniques for online finetuning. Our analysis reveals that PEFT techniques in conjunction with policy pre-training generally help reduce the number of samples to necessary to improve a policy compared to training models end-to-end from scratch. We further find that tuning as few as less than 1\% of total parameters will improve policy performance compared the zero-shot performance of the base pretrained a policy.
Mitigating Goal Misgeneralization via Minimax Regret
Karim Ahmed Abdel Sadek
Matthew Farrugia-Roberts
Hannah Erlebach
Christian Schroeder de Witt
Usman Anwar
Michael D Dennis
Robustness research in reinforcement learning often focuses on ensuring that the policy consistently exhibits capable, goal-driven behavior.… (see more) However, not every capable behavior is the intended behavior. *Goal misgeneralization* can occur when the policy generalizes capably with respect to a 'proxy goal' whose optimal behavior correlates with the intended goal on the training distribution, but not out of distribution. Though the intended goal would be ambiguous if they were perfectly correlated in training, we show progress can be made if the goals are only *nearly ambiguous*, with the training distribution containing a small proportion of *disambiguating* levels. We observe that the training signal from disambiguating levels could be amplified by regret-based prioritization. We formally show that approximately optimal policies on maximal-regret levels avoid the harmful effects of goal misgeneralization, which may exist without this prioritization. Empirically, we find that current regret-based Unsupervised Environment Design (UED) methods can mitigate the effects of goal misgeneralization, though do not always entirely eliminate it. Our theoretical and empirical results show that as UED methods improve they could further mitigate goal misgeneralization in practice.
Multi-Task Reinforcement Learning Enables Parameter Scaling
Reginald McLean
Evangelos Chatzaroulas
J K Terry
Isaac Woungang
Nariman Farsad
Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have … (see more)focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naively scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.
Optimal discounting for offline input-driven MDP
Randy Lefebvre
Offline reinforcement learning has gained a lot of popularity for its potential to solve industry challenges. However, real-world environmen… (see more)ts are often highly stochastic and partially observable, leading long-term planners to overfit to offline data in model-based settings. Input-driven Markov Decision Processes (IDMDPs) offer a way to work with some of the uncertainty by letting designers separate what the agent has control over (states) from what it cannot (inputs) in the environnement. These stochastic external inputs are often difficult to model. Under the assumption that the input model will be imperfect, we investigate the bias-variance tradeoff under shallow planning in IDMDPs. Paving the way to input-driven planning horizons, we also investigate the similarity of optimal planning horizons at different inputs given the structure of the input space.
Optimistic critics can empower small actors
Olya Mastikhina
Dhruv Sreenivas
Actor-critic methods have been central to many of the recent advances in deep reinforcement learning. The most common approach is to use _sy… (see more)mmetric_ architectures, whereby both actor and critic have the same network topology and number of parameters. However, recent works have argued for the advantages of _asymmetric_ setups, specifically with the use of smaller actors. We perform broad empirical investigations and analyses to better understand the implications of this and find that, in general, smaller actors result in performance degradation and overfit critics. Our analyses suggest _poor data collection_, due to value underestimation, as one of the main causes for this behavior, and further highlight the crucial role the critic can play in alleviating this pathology. We explore techniques to mitigate the observed value underestimation, which enables further research in asymmetric actor-critic methods.
Understanding the Effectiveness of Learning Behavioral Metrics in Deep Reinforcement Learning
Ziyan Luo
Tianwei Ni
A key approach to state abstraction is approximating behavioral metrics (notably, bisimulation metrics) in the observation space, and embed … (see more)these learned distances in the representation space. While promising for robustness to task-irrelevant noise shown in prior work, accurately estimating these metrics remains challenging, requiring various design choices that create gaps between theory and practice. Prior evaluations focus mainly on final returns, leaving the quality of learned metrics and the source of performance gains unclear. To systematically assess how metric learning works in deep RL, we evaluate five recent approaches. We unify them under isometric embedding, identify key design choices, and benchmark them with baselines across 20 state-based and 14 pixel-based tasks, spanning 250+ configurations with diverse noise settings. Beyond final returns, we introduce the denoising factor to quantify the encoder’s ability to filter distractions. To further isolate the effect of metric learning, we propose an isolated metric estimation setting, where the encoder is influenced solely by the metric loss. Our results show that metric learning improves return and denoising only marginally, as its benefits fade when key design choices, such as layer normalization and self-prediction loss, are incorporated into the baseline. We also find that commonly used benchmarks (e.g., grayscale videos, varying state-based Gaussian noise dimensions) add little difficulty, while Gaussian noise with random projection and pixel-based Gaussian noise remain challenging even for the best methods. Finally, we release an open-source, modular codebase to improve reproducibility and support future research on metric learning in deep RL.
Neurospectrum: A Geometric and Topological Deep Learning Framework for Uncovering Spatiotemporal Signatures in Neural Activity
Dhananjay Bhaskar
Jessica Moore
Feng Gao
Bastian Rieck
Firas Khasawneh
Elizabeth Munch
Valentina Greco
Neural signals are high-dimensional, noisy, and dynamic, making it challenging to extract interpretable features linked to behavior or disea… (see more)se. We introduce Neurospectrum, a framework that encodes neural activity as latent trajectories shaped by spatial and temporal structure. At each timepoint, signals are represented on a graph capturing spatial relationships, with a learnable attention mechanism highlighting important regions. These are embedded using graph wavelets and passed through a manifold-regularized autoencoder that preserves temporal geometry. The resulting latent trajectory is summarized using a principled set of descriptors - including curvature, path signatures, persistent homology, and recurrent networks -that capture multiscale geometric, topological, and dynamical features. These features drive downstream prediction in a modular, interpretable, and end-to-end trainable framework. We evaluate Neurospectrum on simulated and experimental datasets. It tracks phase synchronization in Kuramoto simulations, reconstructs visual stimuli from calcium imaging, and identifies biomarkers of obsessive-compulsive disorder in fMRI. Across tasks, Neurospectrum uncovers meaningful neural dynamics and outperforms traditional analysis methods.
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen
Morgane M Moss
Arian Hosseini
Kernel-Level Event-Based Performance Anomaly Detection in Software Systems under Varying Load Conditions
Anthonia Njoku
Heng Li