Publications

Modeling the Long Term Future in Model-Based Reinforcement Learning

Nan Rosemary Ke

Amanpreet Singh

Ahmed Touati

Anirudh Goyal

Yoshua Bengio

Devi Parikh

Dhruv Batra

2018-12-31

International Conference on Learning Representations (poster)

openreview.net

Multi-objective training of Generative Adversarial Networks with multiple discriminators

Isabela Albuquerque

Joao Monteiro

Thang Doan

Breandan Considine

Tiago Falk

Ioannis Mitliagkas

Recent literature has demonstrated promising results for training Generative Adversarial Networks by employing a set of discriminators, in c… (see more)ontrast to the traditional game involving one generator against a single adversary. Such methods perform single-objective optimization on some simple consolidation of the losses, e.g. an arithmetic average. In this work, we revisit the multiple-discriminator setting by framing the simultaneous minimization of losses provided by different models as a multi-objective optimization problem. Specifically, we evaluate the performance of multiple gradient descent and the hypervolume maximization algorithm on a number of different datasets. Moreover, we argue that the previously proposed methods and hypervolume maximization can all be seen as variations of multiple gradient descent in which the update direction can be computed efficiently. Our results indicate that hypervolume maximization presents a better compromise between sample quality and computational cost than previous methods.

2018-12-31

ICML (published)

proceedings.mlr.press

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

Christopher Pal

Neural generative models have been become increasingly popular when building conversational agents. They offer flexibility, can be easily ad… (see more)apted to new domains, and require minimal domain engineering. A common criticism of these systems is that they seldom understand or use the available dialog history effectively. In this paper, we take an empirical approach to understanding how these models use the available dialog history by studying the sensitivity of the models to artificially introduced unnatural changes or perturbations to their context at test time. We experiment with 10 different types of perturbations on 4 multi-turn dialog datasets and find that commonly used neural dialog architectures like recurrent and transformer-based seq2seq models are rarely sensitive to most perturbations such as missing or reordering utterances, shuffling words, etc. Also, by open-sourcing our code, we believe that it will serve as a useful diagnostic tool for evaluating dialog systems in the future.

2018-12-31

Association for Computational Linguistics (published)

doi.org

arxiv.org

Neural Multisensory Scene Inference

Jae Hyun Lim

Pedro O. Pinheiro

Negar Rostamzadeh

Christopher Pal

Sungjin Ahn

For embodied agents to infer representations of the underlying 3D physical world they inhabit, they should efficiently combine multisensory … (see more)cues from numerous trials, e.g., by looking at and touching objects. Despite its importance, multisensory 3D scene representation learning has received less attention compared to the unimodal setting. In this paper, we propose the Generative Multisensory Network (GMN) for learning latent representations of 3D scenes which are partially observable through multiple sensory modalities. We also introduce a novel method, called the Amortized Product-of-Experts, to improve the computational efficiency and the robustness to unseen combinations of modalities at test time. Experimental results demonstrate that the proposed model can efficiently infer robust modality-invariant 3D-scene representations from arbitrary combinations of modalities and perform accurate cross-modal generation. To perform this exploration we have also developed a novel multi-sensory simulation environment for embodied agents.

2018-12-31

NeurIPS (published)

arxiv.org

Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks

Natural language is hierarchically structured: smaller units (e.g., phrases) are nested within larger units (e.g., clauses). When a larger c… (see more)onstituent ends, all of the smaller constituents that are nested within it must also be closed. While the standard LSTM architecture allows different neurons to track information at different time scales, it does not have an explicit bias towards modeling a hierarchy of constituents. This paper proposes to add such an inductive bias by ordering the neurons; a vector of master input and forget gates ensures that when a given neuron is updated, all the neurons that follow it in the ordering are also updated. Our novel recurrent architecture, ordered neurons LSTM (ON-LSTM), achieves good performance on four different tasks: language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference.

2018-12-31

ICLR.cc/2019/Conference (oral)

openreview.net

Prediction of Disease Progression in Multiple Sclerosis Patients using Deep Learning Analysis of MRI Data

Adrian Tousignant

Paul Lemaitre

Doina Precup

Douglas L. Arnold

Tal Arbel

2018-12-31

International Conference on Medical Imaging with Deep Learning (published)

proceedings.mlr.press

No Press Diplomacy: Modeling Multi-Agent Gameplay

Philip Paquette

Yuchen Lu

Steven Bocco

Max O. Smith

Satya Ortiz-Gagné

Jonathan K. Kummerfeld

Satinder Singh

Joelle Pineau

Aaron Courville

Diplomacy is a seven-player non-stochastic, non-cooperative game, where agents acquire resources through a mix of teamwork and betrayal. Rel… (see more)iance on trust and coordination makes Diplomacy the first non-cooperative multi-agent benchmark for complex sequential social dilemmas in a rich environment. In this work, we focus on training an agent that learns to play the No Press version of Diplomacy where there is no dedicated communication channel between players. We present DipNet, a neural-network-based policy model for No Press Diplomacy. The model was trained on a new dataset of more than 150,000 human games. Our model is trained by supervised learning (SL) from expert trajectories, which is then used to initialize a reinforcement learning (RL) agent trained through self-play. Both the SL and RL agents demonstrate state-of-the-art No Press performance by beating popular rule-based bots.

2018-12-31

Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (published)

doi.org

arxiv.org

Probability Distillation: A Caveat and Alternatives

Alexandre Lacoste

Due to Van den Oord et al. (2018), probability distillation has recently been of interest to deep learning practitioners, where, as a practi… (see more)cal workaround for deploying autoregressive models in real-time applications, a student network is used to obtain quality samples in parallel. We identify a pathological optimization issue with the adopted stochastic minimization of the reverse-KL divergence: the curse of dimensionality results in a skewed gradient distribution that renders training inefﬁcient. This means that KL-based “evaluative” training can be susceptible to poor exploration if the target distribution is highly structured. We then explore alternative principles for distillation, including one with an “instructive” signal, and show that it is possible to achieve qualitatively better results than with KL minimization.

2018-12-31

Conference on Uncertainty in Artificial Intelligence (published)

proceedings.mlr.press

Quaternion Recurrent Neural Networks

Titouan Parcollet

Mirco Ravanaelli

Mohamed Morchid

Georges Linarès

Chiheb Trabelsi

Renato De Mori

Yoshua Bengio

Recurrent neural networks (RNNs) are powerful architectures to model sequential data, due to their capability to learn short and long-term d… (see more)ependencies between the basic elements of a sequence. Nonetheless, popular tasks such as speech or images recognition, involve multi-dimensional input features that are characterized by strong internal dependencies between the dimensions of the input vector. We propose a novel quaternion recurrent neural network (QRNN), alongside with a quaternion long-short term memory neural network (QLSTM), that take into account both the external relations and these internal structural dependencies with the quaternion algebra. Similarly to capsules, quaternions allow the QRNN to code internal dependencies by composing and processing multidimensional features as single entities, while the recurrent operation reveals correlations between the elements composing the sequence. We show that both QRNN and QLSTM achieve better performances than RNN and LSTM in a realistic application of automatic speech recognition. Finally, we show that QRNN and QLSTM reduce by a maximum factor of 3.3x the number of free parameters needed, compared to real-valued RNNs and LSTMs to reach better results, leading to a more compact representation of the relevant information.

2018-12-31

ICLR.cc/2019/Conference (poster)

openreview.net

Real-Time Reinforcement Learning

Simon Ramstedt

Christopher Pal

2018-12-31

Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (published)

arxiv.org

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal

Philemon Brakel

William Fedus

Soumye Singhal

Timothy Lillicrap

Sergey Levine

Hugo Larochelle

Yoshua Bengio

In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provid… (see more)e a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a backtracking model that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and sample for which the (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

2018-12-31

International Conference on Learning Representations (poster)

doi.org

openreview.net

Recurrent Value Functions

Pierre Thodoroff

Nishanth Anand

Lucas Caccia

Doina Precup

Joelle Pineau

Despite recent successes in Reinforcement Learning, value-based methods often suffer from high variance hindering performance. In this paper… (see more), we illustrate this in a continuous control setting where state of the art methods perform poorly whenever sensor noise is introduced. To overcome this issue, we introduce Recurrent Value Functions (RVFs) as an alternative to estimate the value function of a state. We propose to estimate the value function of the current state using the value function of past states visited along the trajectory. Due to the nature of their formulation, RVFs have a natural way of learning an emphasis function that selectively emphasizes important states. First, we establish RVF's asymptotic convergence properties in tabular settings. We then demonstrate their robustness on a partially observable domain and continuous control tasks. Finally, we provide a qualitative interpretation of the learned emphasis function.

2018-12-31

arXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications