David Meger

Valliappan Chidambaram Adaikkappan

Doctorat - McGill

Google Scholar

Wesley Chung

Doctorat - McGill

Co-superviseur⋅e :

Doina Precup

Farnoosh Faraji

Doctorat - McGill

Co-superviseur⋅e :

Maîtrise recherche - McGill

Co-superviseur⋅e :

Hsiu-Chin Lin

Zina Kamel

Maîtrise recherche - McGill

Co-superviseur⋅e :

Hsiu-Chin Lin

Arian Sargazi

Doctorat - McGill

Junming(Clark) Shi

Maîtrise recherche - McGill

Steven Wang

Maîtrise recherche - McGill

Harley Wiltzer

Doctorat - McGill

Co-superviseur⋅e :

Marc Gendron-Bellemare

Site web

Google Scholar

Publications

Distributional Hamilton-Jacobi-Bellman Equations for Continuous-Time Reinforcement Learning

Harley Wiltzer

Bellemare Marc-Emmanuel

Continuous-time reinforcement learning offers an appealing formalism for describing control problems in which the passage of time is not nat… (voir plus)urally divided into discrete increments. Here we consider the problem of predicting the distribution of returns obtained by an agent interacting in a continuous-time, stochastic environment. Accurate return predictions have proven useful for determining optimal policies for risk-sensitive control, learning state representations, multiagent coordination, and more. We begin by establishing the distributional analogue of the Hamilton-Jacobi-Bellman (HJB) equation for Itô diffusions and the broader class of Feller-Dynkin processes. We then specialize this equation to the setting in which the return distribution is approximated by

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (publié)

proceedings.mlr.press

Why Should I Trust You, Bellman? The Bellman Error is a Poor Replacement for Value Error

Scott Fujimoto

Doina Precup

Ofir Nachum

Shixiang Shane Gu

In this work, we study the use of the Bellman equation as a surrogate objective for value prediction accuracy. While the Bellman equation is… (voir plus) uniquely solved by the true value function over all state-action pairs, we find that the Bellman error (the difference between both sides of the equation) is a poor proxy for the accuracy of the value function. In particular, we show that (1) due to cancellations from both sides of the Bellman equation, the magnitude of the Bellman error is only weakly related to the distance to the true value function, even when considering all state-action pairs, and (2) in the finite data regime, the Bellman equation can be satisfied exactly by infinitely many suboptimal solutions. This means that the Bellman error can be minimized without improving the accuracy of the value function. We demonstrate these phenomena through a series of propositions, illustrative toy examples, and empirical analysis in standard benchmark domains.

2022-06-27

Proceedings of the 39th International Conference on Machine Learning (publié)

proceedings.mlr.press

Adaptive Confidence Calibration

Jonathan W. Pearce

2022-05-26

Applied Informatics (publié)

IL-flOw: Imitation Learning from Observation using Normalizing Flows

Wei-Di Chang

Juan Higuera

Scott Fujimoto

2022-05-18

ArXiv (prépublication)

Continuous MDP Homomorphisms and Homomorphic Policy Gradient

Sahand Rezaei-Shoshtari

Abstraction has been widely studied as a way to improve the efficiency and generalization of reinforcement learning algorithms. In this pape… (voir plus)r, we study abstraction in the continuous-control setting. We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces. We derive a policy gradient theorem on the abstract MDP, which allows us to leverage approximate symmetries of the environment for policy optimization. Based on this theorem, we propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously, using the lax bisimulation metric. We demonstrate the effectiveness of our method on benchmark tasks in the DeepMind Control Suite. Our method's ability to utilize MDP homomorphisms for representation learning leads to improved performance when learning from pixel observations.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (publié)

openreview.net

Learning Assisted Identification of Scenarios Where Network Optimization Algorithms Under-Perform

Dmitriy Rivkin

Di Wu

X. T. Chen

Xue Liu

We present a generative adversarial method that uses deep learning to identify network load traffic conditions in which network optimization… (voir plus) algorithms under-perform other known algorithms: the Deep Convolutional Failure Generator (DCFG). The spatial distribution of network load presents challenges for network operators for tasks such as load balancing, in which a network optimizer attempts to maintain high quality communication while at the same time abiding capacity constraints. Testing a network optimizer for all possible load distributions is challenging if not impossible. We propose a novel method that searches for load situations where a target network optimization method underperforms baseline, which are key test cases that can be used for future refinement and performance optimization. By modeling a realistic network simulator's quality assessments with a deep network and, in parallel, optimizing a load generation network, our method efficiently searches the high dimensional space of load patterns and reliably finds cases in which a target network optimization method under-performs a baseline by a significant margin.

2021-11-30

Global Communications Conference (publié)

Active 3D Shape Reconstruction from Vision and Touch

Edward J. Smith

Luis Pineda

Roberto Calandra

Jitendra Malik

Adriana Romero

Michal Drozdzal

Humans build 3D understandings of the world through active object exploration, using jointly their senses of vision and touch. However, in 3… (voir plus)D shape reconstruction, most recent progress has relied on static datasets of limited sensory data such as RGB images, depth maps or haptic readings, leaving the active exploration of the shape largely unexplored. In active touch sensing for 3D reconstruction, the goal is to actively select the tactile readings that maximize the improvement in shape reconstruction accuracy. However, the development of deep learning-based active touch models is largely limited by the lack of frameworks for shape exploration. In this paper, we focus on this problem and introduce a system composed of: 1) a haptic simulator leveraging high spatial resolution vision-based tactile sensors for active touching of 3D objects; 2) a mesh-based 3D shape reconstruction model that relies on tactile or visuotactile signals; and 3) a set of data-driven solutions with either tactile or visuotactile priors to guide the shape exploration. Our framework enables the development of the first fully data-driven solutions to active touch on top of learned models for object understanding. Our experiments show the benefits of such solutions in the task of 3D shape understanding where our models consistently outperform natural baselines. We provide our framework as a tool to foster future research in this direction.

2021-11-08

NeurIPS.cc/2021/Conference (poster)

openreview.net

Latent Attention Augmentation for Robust Autonomous Driving Policies

Ran Cheng

Christopher Agia

Florian Shkurti

Model-free reinforcement learning has become a viable approach for vision-based robot control. However, sample complexity and adaptability t… (voir plus)o domain shifts remain persistent challenges when operating in high-dimensional observation spaces (images, LiDAR), such as those that are involved in autonomous driving. In this paper, we propose a flexible framework by which a policy’s observations are augmented with robust attention representations in the latent space to guide the agent’s attention during training. Our method encodes local and global descriptors of the augmented state representations into a compact latent vector, and scene dynamics are approximated by a recurrent network that processes the latent vectors in sequence. We outline two approaches for constructing attention maps; a supervised pipeline leveraging semantic segmentation networks, and an unsupervised pipeline relying only on classical image processing techniques. We conduct our experiments in simulation and test the learned policy against varying seasonal effects and weather conditions. Our design decisions are supported in a series of ablation studies. The results demonstrate that our state augmentation method both improves learning efficiency and encourages robust domain adaptation when compared to common end-to-end frameworks and methods that learn directly from intermediate representations.

2021-09-26

IEEE/RJS International Conference on Intelligent Robots and Systems (publié)

Trajectory-Constrained Deep Latent Visual Attention for Improved Local Planning in Presence of Heterogeneous Terrain

Stefan Wapnick

Travis Manderson

We present a reward-predictive, model-based deep learning method featuring trajectory-constrained visual attention for local planning in vis… (voir plus)ual navigation tasks. Our method learns to place visual attention at locations in latent image space which follow trajectories caused by vehicle control actions to enhance predictive accuracy during planning. The attention model is jointly optimized by the task-specific loss and an additional trajectory-constraint loss, allowing adaptability yet encouraging a regularized structure for improved generalization and reliability. Importantly, visual attention is applied in latent feature map space instead of raw image space to promote efficient planning. We validated our model in visual navigation tasks of planning low turbulence, collision-free trajectories in off-road settings and hill climbing with locking differentials in the presence of slippery terrain. Experiments involved randomized procedural generated simulation and real-world environments. We found our method improved generalization and learning efficiency when compared to no-attention and self-attention alternatives.

2021-09-26

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (publié)

An Autonomous Probing System for Collecting Measurements at Depth from Small Surface Vehicles

Yuying Huang

Yiming Yao

Johanna Hansen

Jeremy Mallette

Sandeep Manjanna

This paper presents the portable autonomous probing system (APS), a low-cost robotic design for collecting water quality measurements at tar… (voir plus)geted depths from an autonomous surface vehicle (ASV). This system fills an important but often overlooked niche in marine sampling by enabling mobile sensor observations throughout the near-surface water column without the need for advanced underwater equipment. We present a probe delivery mechanism built with commercially available components and describe the corresponding open-source simulator and winch controller. Finally, we demonstrate the system in a field deployment and discuss design trade-offs and areas for future improvement. Project details are available on https://johannah.github.io/publication/sample-at-depth our website

2021-09-19

OCEANS 2021: San Diego – Porto (publié)

A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation

Scott Fujimoto

Doina Precup

Marginalized importance sampling (MIS), which measures the density ratio between the state-action occupancy of a target policy and that of a… (voir plus) sampling distribution, is a promising approach for off-policy evaluation. However, current state-of-the-art MIS methods rely on complex optimization tricks and succeed mostly on simple toy problems. We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy. The successor representation can be trained through deep reinforcement learning methodology and decouples the reward optimization from the dynamics of the environment, making the resulting algorithm stable and applicable to high-dimensional domains. We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.

2021-06-30

Proceedings of the 38th International Conference on Machine Learning (publié)

proceedings.mlr.press

Multimodal dynamics modeling for off-road autonomous vehicles

Jean-François Tremblay

Travis Manderson

Aurélio Noca

Dynamics modeling in outdoor and unstructured environments is difficult because different elements in the environment interact with the robo… (voir plus)t in ways that can be hard to predict. Leveraging multiple sensors to perceive maximal information about the robot's environment is thus crucial when building a model to perform predictions about the robot's dynamics with the goal of doing motion planning. We design a model capable of long-horizon motion predictions, leveraging vision, lidar and proprioception, which is robust to arbitrarily missing modalities at test time. We demonstrate in simulation that our model is able to leverage vision to predict traction changes. We then test our model using a real-world challenging dataset of a robot navigating through a forest, performing predictions in trajectories unseen during training. We try different modality combinations at test time and show that, while our model performs best when all modalities are present, it is still able to perform better than the baseline even when receiving only raw vision input and no proprioception, as well as when only receiving proprioception. Overall, our study demonstrates the importance of leveraging multiple sensors when doing dynamics modeling in outdoor conditions.

2021-06-04

2021 IEEE International Conference on Robotics and Automation (ICRA) (publié)