Publications

Hierarchical Importance Weighted Autoencoders

Eeshan Dhekane

Alexandre Lacoste

Importance weighted variational inference (Burda et al., 2015) uses multiple i.i.d. samples to have a tighter variational lower bound. We be… (see more)lieve a joint proposal has the potential of reducing the number of redundant samples, and introduce a hierarchical structure to induce correlation. The hope is that the proposals would coordinate to make up for the error made by one another to reduce the variance of the importance estimator. Theoretically, we analyze the condition under which convergence of the estimator variance can be connected to convergence of the lower bound. Empirically, we confirm that maximization of the lower bound does implicitly minimize variance. Further analysis shows that this is a result of negative correlation induced by the proposed hierarchical meta sampling scheme, and performance of inference also improves when the number of samples increases.

2019-05-23

International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Off-Policy Deep Reinforcement Learning without Exploration

Scott Fujimoto

David Meger

Doina Precup

Many practical applications of reinforcement learning constrain agents to learn from a fixed batch of data which has already been gathered, … (see more)without offering further possibility for data collection. In this paper, we demonstrate that due to errors introduced by extrapolation, standard off-policy deep reinforcement learning algorithms, such as DQN and DDPG, are incapable of learning with data uncorrelated to the distribution under the current policy, making them ineffective for this fixed batch setting. We introduce a novel class of off-policy algorithms, batch-constrained reinforcement learning, which restricts the action space in order to force the agent towards behaving close to on-policy with respect to a subset of the given data. We present the first continuous control deep reinforcement learning algorithm which can learn effectively from arbitrary, fixed batch data, and empirically demonstrate the quality of its behavior in several tasks.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

doi.org

proceedings.mlr.press

Per-Decision Option Discounting

Anna Harutyunyan

Peter Vrancx

Philippe Hamel

Ann Nowé

Doina Precup

In order to solve complex problems an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modeled … (see more)through options, offers the ability to reason at many timescales, but the horizon length is still determined by the discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that naturally scales the agent’s horizon with option length. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

proceedings.mlr.press

State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

Denis Kazakov

Michael C. Mozer

Machine learning promises methods that generalize well from finite labeled data. However, the brittleness of existing neural net approaches … (see more)is revealed by notable failures, such as the existence of adversarial examples that are misclassified despite being nearly identical to a training example, or the inability of recurrent sequence-processing nets to stay on track without teacher forcing. We introduce a method, which we refer to as \emph{state reification}, that involves modeling the distribution of hidden states over the training data and then projecting hidden states observed during testing toward this distribution. Our intuition is that if the network can remain in a familiar manifold of hidden space, subsequent layers of the net should be well trained to respond appropriately. We show that this state-reification method helps neural nets to generalize better, especially when labeled data are sparse, and also helps overcome the challenge of achieving robust generalization with adversarial training.

2019-05-23

International Conference on Machine Learning (unknown)

doi.org

proceedings.mlr.press

Stroke Lesion Segmentation in FLAIR MRI Datasets Using Customized Markov Random Fields

Nagesh K. Subbanna

Deepthi Rajashekar

Bastian Cheng

Götz Thomalla

Jens Fiehler

Tal Arbel

Nils D. Forkert

Robust and reliable stroke lesion segmentation is a crucial step toward employing lesion volume as an independent endpoint for randomized tr… (see more)ials. The aim of this work was to develop and evaluate a novel method to segment sub-acute ischemic stroke lesions from fluid-attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) datasets. After preprocessing of the datasets, a Bayesian technique based on Gabor textures extracted from the FLAIR signal intensities is utilized to generate a first estimate of the lesion segmentation. Using this initial segmentation, a customized voxel-level Markov random field model based on intensity as well as Gabor texture features is employed to refine the stroke lesion segmentation. The proposed method was developed and evaluated based on 151 multi-center datasets from three different databases using a leave-one-patient-out validation approach. The comparison of the automatically segmented stroke lesions with manual ground truth segmentation revealed an average Dice coefficient of 0.582, which is in the upper range of previously presented lesion segmentation methods using multi-modal MRI datasets. Furthermore, the results obtained by the proposed technique are superior compared to the results obtained by two methods based on convolutional neural networks and three phase level-sets, respectively, which performed best in the ISLES 2015 challenge using multi-modal imaging datasets. The results of the quantitative evaluation suggest that the proposed method leads to robust lesion segmentation results using FLAIR MRI datasets only as a follow-up sequence.

2019-05-23

Frontiers in Neurology (published)

doi.org

The Value Function Polytope in Reinforcement Learning

Robert Dadashi

Adrien Ali Taiga

Nicolas Roux

Dale Schuurmans

Bellemare Marc-Emmanuel

We establish geometric and topological properties of the space of value functions in finite state-action Markov decision processes. Our main… (see more) contribution is the characterization of the nature of its shape: a general polytope (Aigner et al., 2010). To demonstrate this result, we exhibit several properties of the structural relationship between policies and value functions including the line theorem, which shows that the value functions of policies constrained on all but one state describe a line segment. Finally, we use this novel perspective to introduce visualizations to enhance the understanding of the dynamics of reinforcement learning algorithms.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

proceedings.mlr.press

Understanding the impact of entropy on policy optimization

Zafarali Ahmed

Nicolas Roux

Mohammad Norouzi

Dale Schuurmans

Entropy regularization is commonly used to improve policy optimization in reinforcement learning. It is believed to help with \emph{explorat… (see more)ion} by encouraging the selection of more stochastic policies. In this work, we analyze this claim using new visualizations of the optimization landscape based on randomly perturbing the loss function. We first show that even with access to the exact gradient, policy optimization is difficult due to the geometry of the objective function. Then, we qualitatively show that in some environments, a policy with higher entropy can make the optimization landscape smoother, thereby connecting local optima and enabling the use of larger learning rates. This paper presents new tools for understanding the optimization landscape, shows that policy entropy serves as a regularizer, and highlights the challenge of designing general-purpose policy optimization algorithms.

2019-05-23

Proceedings of the 36th International Conference on Machine Learning (published)

proceedings.mlr.press

The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Jonathan Binas

Sherjil Ozair

Yoshua Bengio

Unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments. The inf… (see more)ormation-theoretic principle of empowerment formalizes an unsupervised exploration objective through an agent trying to maximize its influence on the future states of its environment. Previous approaches carry certain limitations in that they either do not employ closed-loop feedback or do not have an internal state. As a consequence, a privileged final state is taken as an influence measure, rather than the full trajectory. We provide a model-free method which takes into account the whole trajectory while still offering the benefits of option-based approaches. We successfully apply our approach to settings with large action spaces, where discovery of meaningful action sequences is particularly difficult.

2019-05-21

ArXiv (preprint)

arxiv.org

A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

Learning effective visuomotor policies for robots purely from data is challenging, but also appealing since a learning-based system should n… (see more)ot require manual tuning or calibration. In the case of a robot operating in a real environment the training process can be costly, time-consuming, and even dangerous since failures are common at the start of training. For this reason, it is desirable to be able to leverage \textit{simulation} and \textit{off-policy} data to the extent possible to train the robot. In this work, we introduce a robust framework that plans in simulation and transfers well to the real environment. Our model incorporates a gradient-descent based planning module, which, given the initial image and goal image, encodes the images to a lower dimensional latent state and plans a trajectory to reach the goal. The model, consisting of the encoder and planner modules, is trained through a meta-learning strategy in simulation first. We subsequently perform adversarial domain transfer on the encoder by using a bank of unlabelled but random images from the simulation and real environments to enable the encoder to map images from the real and simulated environments to a similarly distributed latent representation. By fine tuning the entire model (encoder + planner) with far fewer real world expert demonstrations, we show successful planning performances in different navigation tasks.

2019-05-19

2019 International Conference on Robotics and Automation (ICRA) (published)

doi.org

arxiv.org

Semantic Mapping for View-Invariant Relocalization.

Jimmy Li

David Meger

Gregory Dudek

We propose a system for visual simultaneous localization and mapping (SLAM) that combines traditional local appearance-based features with s… (see more)emantically meaningful object landmarks to achieve both accurate local tracking and highly view-invariant object-driven relocalization. Our mapping process uses a sampling-based approach to efficiently infer the 3D pose of object landmarks from 2D bounding box object detections. These 3D landmarks then serve as a view-invariant representation which we leverage to achieve camera relocalization even when the viewing angle changes by more than 125 degrees. This level of view-invariance cannot be attained by local appearance-based features (e.g. SIFT) since the same set of surfaces are not even visible when the viewpoint changes significantly. Our experiments show that even when existing methods fail completely for viewpoint changes of more than 70 degrees, our method continues to achieve a relocalization rate of around 90%, with a mean rotational error of around 8 degrees.

2019-05-19

2019 International Conference on Robotics and Automation (ICRA) (published)

doi.org

A Highly Adaptive Acoustic Model for Accurate Multi-dialect Speech Recognition

Sanghyun Yoo

Inchul Song

Yoshua Bengio

Despite the success of deep learning in speech recognition, multi-dialect speech recognition remains a difficult problem. Although dialect-s… (see more)pecific acoustic models are known to perform well in general, they are not easy to maintain when dialect-specific data is scarce and the number of dialects for each language is large. Therefore, a single unified acoustic model (AM) that generalizes well for many dialects has been in demand. In this paper, we propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM. Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously. We also propose a simple but effective training method to deal with unseen dialects. The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.

2019-05-11

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

Representation Mixing for TTS Synthesis

Recent character and phoneme-based parametric TTS systems using deep learning have shown strong performance in natural speech generation. Ho… (see more)wever, the choice between character or phoneme input can create serious limitations for practical deployment, as direct control of pronunciation is crucial in certain cases. We demonstrate a simple method for combining multiple types of linguistic information in a single encoder, named representation mixing, enabling flexible choice between character, phoneme, or mixed representations during inference. Experiments and user studies on a public audiobook corpus show the efficacy of our approach.

2019-05-11

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications