David Meger

Valliappan Chidambaram Adaikkappan

PhD - McGill University

PhD - McGill University

Co-supervisor :

farnoosh.faraji@mila.quebec

chungwes@mila.quebec

Farnoosh Faraji

PhD - McGill University

Co-supervisor :

Gregory Dudek

anas.houssaini@mila.quebec

Google Scholar

Anas Houssaini

Master's Research - McGill University

Co-supervisor :

Hsiu-Chin Lin

arian.sargazi@mila.quebec

Zina Kamel

Master's Research - McGill University

Co-supervisor :

Hsiu-Chin Lin

zina.kamel@mila.quebec

Arian Sargazi

PhD - McGill University

Junming(Clark) Shi

Master's Research - McGill University

junming.shi@mila.quebec

Steven Wang

Master's Research - McGill University

zhizun.wang@mila.quebec

Harley Wiltzer

PhD - McGill University

Co-supervisor :

Marc Gendron-Bellemare

Publications

GeoMetrics: Exploiting Geometric Structure for Graph-Encoded Objects

Edward J. Smith

Adriana Romero

Mesh models are a promising approach for encoding the structure of 3D objects. Current mesh reconstruction systems predict uniformly distrib… (see more)uted vertex locations of a predetermined graph through a series of graph convolutions, leading to compromises with respect to performance or resolution. In this paper, we argue that the graph representation of geometric objects allows for additional structure, which should be leveraged for enhanced reconstruction. Thus, we propose a system which properly benefits from the advantages of the geometric structure of graph encoded objects by introducing (1) a graph convolutional update preserving vertex information; (2) an adaptive splitting heuristic allowing detail to emerge; and (3) a training objective operating both on the local surfaces defined by vertices as well as the global structure defined by the mesh. Our proposed method is evaluated on the task of 3D object reconstruction from images with the ShapeNet dataset, where we demonstrate state of the art performance, both visually and numerically, while having far smaller space requirements by generating adaptive meshes

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

proceedings.mlr.press

Off-Policy Deep Reinforcement Learning without Exploration

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, … (see more)without offering further possibility for data collection. In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

proceedings.mlr.press

Semantic Mapping for View-Invariant Relocalization.

Jimmy Li

Gregory Dudek

We propose a system for visual simultaneous localization and mapping (SLAM) that combines traditional local appearance-based features with s… (see more)emantically meaningful object landmarks to achieve both accurate local tracking and highly view-invariant object-driven relocalization. Our mapping process uses a sampling-based approach to efficiently infer the 3D pose of object landmarks from 2D bounding box object detections. These 3D landmarks then serve as a view-invariant representation which we leverage to achieve camera relocalization even when the viewing angle changes by more than 125 degrees. This level of view-invariance cannot be attained by local appearance-based features (e.g. SIFT) since the same set of surfaces are not even visible when the viewpoint changes significantly. Our experiments show that even when existing methods fail completely for viewpoint changes of more than 70 degrees, our method continues to achieve a relocalization rate of around 90%, with a mean rotational error of around 8 degrees.

2019-05-19

2019 International Conference on Robotics and Automation (ICRA) (published)

Uncertainty Aware Learning from Demonstrations in Multiple Contexts using Bayesian Neural Networks

Sanjay Thakur

Herke van Hoof

Juan Camilo Gamboa Higuera

Diversity of environments is a key challenge that causes learned robotic controllers to fail due to the discrepancies between the training a… (see more)nd evaluation conditions. Training from demonstrations in various conditions can mitigate---but not completely prevent---such failures. Learned controllers such as neural networks typically do not have a notion of uncertainty that allows to diagnose an offset between training and testing conditions, and potentially intervene. In this work, we propose to use Bayesian Neural Networks, which have such a notion of uncertainty. We show that uncertainty can be leveraged to consistently detect situations in high-dimensional simulated and real robotic domains in which the performance of the learned controller would be sub-par. Also, we show that such an uncertainty based solution allows making an informed decision about when to invoke a fallback strategy. One fallback strategy is to request more data. We empirically show that providing data only when requested results in increased data-efficiency.

2018-12-31

ICRA (published)

arxiv.org

Where Off-Policy Deep Reinforcement Learning Fails

This work examines batch reinforcement learning–the task of maximally exploiting a given batch of off-policy data, without further data co… (see more)llection. We demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are only capable of learning with data correlated to their current policy, making them ineffective for most off-policy applications. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space to force the agent towards behaving on-policy with respect to a subset of the given data. We extend this notion to deep reinforcement learning, and to the best of our knowledge, present the first continuous control deep reinforcement learning algorithm which can learn effectively from uncorrelated off-policy data.

2018-09-26

(published)

openreview.net

Addressing Function Approximation Error in Actor-Critic Methods

Herke van Hoof

In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated valu… (see more)e estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.

2018-07-02

Proceedings of the 35th International Conference on Machine Learning (published)

proceedings.mlr.press

Deep Reinforcement Learning that Matters

Philip Bachman

In recent years, significant progress has been made in solving challenging problems across various domains using deep reinforcement learning… (see more) (RL). Reproducing existing work and accurately judging the improvements offered by novel methods is vital to sustaining this progress. Unfortunately, reproducing results for state-of-the-art deep RL methods is seldom straightforward. In particular, non-determinism in standard benchmark environments, combined with variance intrinsic to the methods, can make reported results tough to interpret. Without significance metrics and tighter standardization of experimental reporting, it is difficult to determine whether improvements over the prior state-of-the-art are meaningful. In this paper, we investigate challenges posed by reproducibility, proper experimental techniques, and reporting procedures. We illustrate the variability in reported metrics and results when comparing against common baselines and suggest guidelines to make future results in deep RL more reproducible. We aim to spur discussion about how to ensure continued progress in the field by minimizing wasted effort stemming from results that are non-reproducible and easily misinterpreted.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (published)

arxiv.org

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Wei-Di Chang

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward fu… (see more)nction can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

2018-04-28

Proceedings of the AAAI Conference on Artificial Intelligence (published)