Publications

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker
Marcel Hussing
Eric R. Eaton
Igor Gilitschenski
Building deep reinforcement learning (RL) agents that find a good policy with few samples has proven notoriously challenging. To achieve sam… (see more)ple efficiency, recent work has explored updating neural networks with large numbers of gradient steps for every new sample. While such high update-to-data (UTD) ratios have shown strong empirical performance, they also introduce instability to the training process. Previous approaches need to rely on periodic neural network parameter resets to address this instability, but restarting the training process is infeasible in many real-world applications and requires tuning the resetting interval. In this paper, we focus on one of the core difficulties of stable training with limited samples: the inability of learned value functions to generalize to unobserved on-policy actions. We mitigate this issue directly by augmenting the off-policy RL training process with a small amount of data generated from a learned world model. Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD) uses small amounts of generated data to stabilize high UTD training and achieve competitive performance on the most challenging tasks in the DeepMind control suite. Our experiments further highlight the importance of employing a good model to generate data, MAD-TD's ability to combat value overestimation, and its practical stability gains for continued learning.
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
Chandra Kiran Reddy Evuru
Alexandre Lacoste
Krishnamurthy (DJ) Dvijotham
The practice of fine-tuning AI agents on data from their own interactions--such as web browsing or tool use--, while being a strong general … (see more)recipe for improving agentic capabilities, also introduces a critical security vulnerability within the AI supply chain. In this work, we show that adversaries can easily poison the data collection pipeline to embed hard-to-detect backdoors that are triggerred by specific target phrases, such that when the agent encounters these triggers, it performs an unsafe or malicious action. We formalize and validate three realistic threat models targeting different layers of the supply chain: 1) direct poisoning of fine-tuning data, where an attacker controls a fraction of the training traces; 2) environmental poisoning, where malicious instructions are injected into webpages scraped or tools called while creating training data; and 3) supply chain poisoning, where a pre-backdoored base model is fine-tuned on clean data to improve its agentic capabilities. Our results are stark: by poisoning as few as 2% of the collected traces, an attacker can embed a backdoor causing an agent to leak confidential user information with over 80% success when a specific trigger is present. This vulnerability holds across all three threat models. Furthermore, we demonstrate that prominent safeguards, including two guardrail models and one weight-based defense, fail to detect or prevent the malicious behavior. These findings highlight an urgent threat to agentic AI development and underscore the critical need for rigorous security vetting of data collection processes and end-to-end model supply chains.
Maximizing Data and Hardware Reuse for HLS with Early-Stage Symbolic Partitioning.
While traditional High-Level Synthesis (HLS) converts “high-level” C-like programs into hardware automatically, producing high-performan… (see more)ce designs still requires hardware expertise. Optimizations such as data partitioning can have a large impact on performance since they directly affect data reuse patterns and the ability to reuse hardware. However, optimizing partitioning is a difficult process since minor changes in the parameter choices can lead to totally unpredictable performance. Functional array-based languages have been proposed instead of C-based approaches, as they offer stronger performance guarantees. This article proposes to follow a similar approach and exposes a divide-and-conquer primitive at the algorithmic level to let users partition any arbitrary computation. The compiler is then free to explore different partition shapes to maximize both data and hardware reuse automatically. The main challenge remains that the impact of partitioning is only known much later in the compilation flow. This is due to the hard-to-predict effects of the many optimizations applied during compilation. To solve this problem, the partitioning is expressed using a set of symbolic tunable parameters, introduced early in the compilation pipeline. A symbolic performance model is then used in the last compilation stage to predict performance based on the possible values of the tunable parameters. Using this approach, a design space exploration is conducted on an Intel Arria 10 Field Programmable Gate Arrays (FPGAs), and competitive performance is achieved on the classical VGG and TinyYolo neural networks.
Meta-learning how to Share Credit among Macro-Actions
Ionel-Alexandru Hosu
Traian Rebedea
One proposed mechanism to improve exploration in reinforcement learning is through the use of macro-actions. Paradoxically though, in many s… (see more)cenarios the naive addition of macro-actions does not lead to better exploration, but rather the opposite. It has been argued that this was caused by adding non-useful macros and multiple works have focused on mechanisms to discover effectively environment-specific useful macros. In this work, we take a slightly different perspective. We argue that the difficulty stems from the trade-offs between reducing the average number of decisions per episode versus increasing the size of the action space. Namely, one typically treats each potential macro-action as independent and atomic, hence strictly increasing the search space and making typical exploration strategies inefficient. To address this problem we propose a novel regularization term that exploits the relationship between actions and macro-actions to improve the credit assignment mechanism by reducing the effective dimension of the action space and, therefore, improving exploration. The term relies on a similarity matrix that is meta-learned jointly with learning the desired policy. We empirically validate our strategy looking at macro-actions in Atari games, and the StreetFighter II environment. Our results show significant improvements over the Rainbow-DQN baseline in all environments. Additionally, we show that the macro-action similarity is transferable to related environments. We believe this work is a small but important step towards understanding how the similarity-imposed geometry on the action space can be exploited to improve credit assignment and exploration, therefore making learning more effective.
Min-Max Optimisation for Nonconvex-Nonconcave Functions Using a Random Zeroth-Order Extragradient Algorithm
Amir Ali Farzin
Yuen-Man Pun
Philipp Braun
Youssef Diouane
Iman Shames
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang
Johan Obando-Ceron
Plasticity, or the ability of an agent to adapt to new tasks, environments, or distributions, is crucial for continual learning. In this pap… (see more)er, we study the loss of plasticity in deep continual RL from the lens of churn: network output variability for out-of-batch data induced by mini-batch training. We demonstrate that (1) the loss of plasticity is accompanied by the exacerbation of churn due to the gradual rank decrease of the Neural Tangent Kernel (NTK) matrix; (2) reducing churn helps prevent rank collapse and adjusts the step size of regular RL gradients adaptively. Moreover, we introduce Continual Churn Approximated Reduction (C-CHAIN) and demonstrate it improves learning performance and outperforms baselines in a diverse range of continual learning environments on OpenAI Gym Control, ProcGen, DeepMind Control Suite, and MinAtar benchmarks.
Mixed-Integer Second-Order Cone Programming for Multi-period Scheduling of Flexible AC Transmission System Devices
Mohamad Charara
Martin De Montigny
Nivine Abou Daher
With the increasing energy demand and the growing integration of renewable sources of energy, power systems face operational challenges such… (see more) as overloads, losses, and stability concerns, particularly as networks operate near their capacity limits. Flexible alternating current transmission system (FACTS) devices are essential to ensure reliable grid operations and enable the efficient integration of renewable energy. This work introduces a mixed-integer second-order cone programming (MISOCP) model for the multi-period scheduling of key FACTS devices in electric transmission systems. The proposed model integrates four key control mechanisms: (i) on-load tap changers (OLTCs) for voltage regulation via discrete taps; (ii) static synchronous compensators (STATCOMs) and (iii) shunt reactors for reactive power compensation; and (iv) thyristor-controlled series capacitors (TCSCs) for adjustable impedance and flow control. The objective is to minimize active power losses using a limited number of control actions while meeting physical and operational constraints at all times throughout the defined time horizon. To ensure tractability, the model employs a second-order cone relaxation of the power flow. Device-specific constraints are handled via binary expansion and linearization: OLTCs and shunt reactors are modelled with discrete variables, STATCOMs through reactive power bounds, and TCSCs using a reformulation-linearization technique (RLT). A multi-period formulation captures the sequential nature of decision making, ensuring consistency across time steps. The model is evaluated on the IEEE 9-bus, 30-bus, and RTS96 test systems, demonstrating its ability to reduce losses, with potential applicability to larger-scale grids.
MLOps, LLMOps, FMOps, and Beyond
Chakkrit Kla Tantithamthavorn
Fabio Palomba
Joselito Joey Chua
Morphometric characteristics of tibial nerve and their relationship with age
Shahram Oveisgharan
Jingyun Yang
Sue E. Leurgans
Veronique VanderHorst
David A. Bennett
Osvaldo Delbono
Aron S. Buchman
Peripheral nerve comprises a crucial component of the distributed motor/sensory system. However, there is a paucity of data on peripheral ne… (see more)rve morphology derived from large numbers of older adults. This study aimed to quantify the morphometric characteristics of myelinated nerve fibres of the tibial nerve obtained from deceased community-dwelling older adults and examine their association with age. The tibial nerves were obtained from consecutive autopsies of older adults without a history of diabetes who were participants of the Rush Memory and Aging Project, an ongoing longitudinal clinical-autopsy study. A nerve fascicle, obtained from a fixed popliteal segment of the tibial nerve, was separated from the blood vessels and adipose tissue for postmortem examination under an optical microscope. Morphometric characteristics of the myelinated nerve fibres were automatically segmented and quantified using our open-source software AxonDeepSeg. The participants (N = 140) had a mean age of 92.0 years (SD = 5.4) at death, and 72.1% (N = 101) were women. We examined 754 247 myelinated nerve fibres, with an average 5387 (SD = 3436) nerve fibres per participant. The average diameter of myelinated nerve fibres was 4.9 µm (SD = 3.1), axon diameter was 2.0 µm (SD = 1.4), myelin thickness was 1.4 µm (SD = 0.96) and the g-ratio (ratio of axon diameter to myelinated nerve fibre diameter) was 0.45 (SD = 0.17). The relationship between axon diameter and myelin thickness was nonlinear. Myelin was thicker in larger axons up to a diameter of 8 µm, beyond which myelin thickness plateaued. Older age at death was associated with smaller myelinated nerve fibres, smaller axons and thinner myelin. However, age at death was not correlated with myelinated nerve fibre density and was not associated with the average of g-ratio. The association between older age and smaller myelinated nerve fibres was largely attributable to a lower percentage of myelinated nerve fibres >8 µm. We conclude that the smaller tibial myelinated nerve fibres observed in older adults may reflect axonal atrophy rather than degeneration and regeneration of the myelinated nerve fibres. Further research is needed to investigate the pathologies and molecular mechanisms underlying these age-related morphometric changes and their clinical implications in older adults.
Most German Speakers Ignore the Cue That Best Predicts Plural Class
Kate McCurdy
Timothy J. O'Donnell
Adam Lopez
Sharon Goldwater
Researchers generally assume that speakers use the linguistic information available to them. For instance, if one grammatical category robus… (see more)tly predicts another grammatical category, we expect speakers to reproduce this conditional relationship during language production. Here, we investigate this assumption for grammatical gender in German. Gender is the single cue which most strongly predicts the plural class of existing German nouns, but behavioral studies with novel nouns have found mixed results regarding the role of gender in plural generalization. Across three experiments, we examine how individual German speakers use grammatical gender when producing plural forms of novel nouns. We find that most speakers effectively ignore gender during plural class production, even under experimental manipulations that encourage them to attend to this cue. These results point toward an underexplored direction in cognitive science: accounting for the linguistic information that speakers do not use.
Multilingual Language Model Pretraining using Machine-translated Data
Jiayi Wang
Maurice Weber
Max Ryabinin
Yihong Chen
Raphael Tang
Pontus Stenetorp
Multimodal and Force-Matched Imitation Learning with a See-Through Visuotactile Sensor
Trevor Ablett
Oliver Limoyo
Adam Sigal
Jonathan Kelly
Francois Hogan
Kinesthetic Teaching is a popular approach to collecting expert robotic demonstrations of contact-rich tasks for imitation learning (IL), bu… (see more)t it typically only measures motion, ignoring the force placed on the environment by the robot. Furthermore, contact-rich tasks require accurate sensing of both reaching and touching, which can be difficult to provide with conventional sensing modalities. We address these challenges with a See-Through-your-Skin (STS) visuotactile sensor, using the sensor both (i) as a measurement tool to improve kinesthetic teaching, and (ii) as a policy input in contact-rich door manipulation tasks. An STS sensor can be switched between visual and tactile modes by leveraging a semi-transparent surface and controllable lighting, allowing for both pre-contact visual sensing and during-contact tactile sensing with a single sensor. First, we propose tactile force matching, a methodology that enables a robot to match forces read during kinesthetic teaching using tactile signals. Second, we develop a policy that controls STS mode switching, allowing a policy to learn the appropriate moment to switch an STS from its visual to its tactile mode. Finally, we study multiple observation configurations to compare and contrast the value of visual and tactile data from an STS with visual data from a wrist-mounted eye-in-hand camera. With over 3,000 test episodes from real-world manipulation experiments, we find that the inclusion of force matching raises average policy success rates by 62.5%, STS mode switching by 30.3%, and STS data as a policy input by 42.5%. Our results highlight the utility of see-through tactile sensing for IL, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.