Portrait of Maxime Wabartha is unavailable

Maxime Wabartha

PhD - McGill University
Supervisor
Research Topics
Deep Learning
Reinforcement Learning

Publications

Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation
Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns… (see more), and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments, UARL enhances robust generalization to the target domain without explicitly training on it. We evaluate UARL on MuJoCo benchmarks and a quadrupedal robot, demonstrating its effectiveness in reliable OOD detection, improved performance, and enhanced sample efficiency compared to baselines.
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. Linear … (see more)policies can justify their decisions while interacting in a dynamic environment, but their reduced expressivity prevents them from solving hard tasks. Instead, we argue for the use of piecewise-linear policies. We carefully study to what extent they can retain the interpretable properties of linear policies while reaching competitive performance with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the decision process of the agent without requiring an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance. Moreover, we validate that the restricted model class that the HyperCombinator belongs to is compatible with the algorithmic constraints of various reinforcement learning algorithms.
Piecewise Linear Parametrization of Policies: Towards Interpretable Deep Reinforcement Learning
Learning inherently interpretable policies is a central challenge in the path to developing autonomous agents that humans can trust. We argu… (see more)e for the use of policies that are piecewise-linear. We carefully study to what extent they can retain the interpretable properties of linear policies while performing competitively with neural baselines. In particular, we propose the HyperCombinator (HC), a piecewise-linear neural architecture expressing a policy with a controllably small number of sub-policies. Each sub-policy is linear with respect to interpretable features, shedding light on the agent’s decision process without needing an additional explanation model. We evaluate HC policies in control and navigation experiments, visualize the improved interpretability of the agent and highlight its trade-off with performance.
Handling Black Swan Events in Deep Learning with Diversely Extrapolated Neural Networks
By virtue of their expressive power, neural networks (NNs) are well suited to fitting large, complex datasets, yet they are also known to … (see more)produce similar predictions for points outside the training distribution. As such, they are, like humans, under the influence of the Black Swan theory: models tend to be extremely "surprised" by rare events, leading to potentially disastrous consequences, while justifying these same events in hindsight. To avoid this pitfall, we introduce DENN, an ensemble approach building a set of Diversely Extrapolated Neural Networks that fits the training data and is able to generalize more diversely when extrapolating to novel data points. This leads DENN to output highly uncertain predictions for unexpected inputs. We achieve this by adding a diversity term in the loss function used to train the model, computed at specific inputs. We first illustrate the usefulness of the method on a low-dimensional regression problem. Then, we show how the loss can be adapted to tackle anomaly detection during classification, as well as safe imitation learning problems.