Aaron Courville

Reza Bayat

Doctorat - UdeM

Co-superviseur⋅e :

Pascal Vincent

Anirudh Buvanesh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Laurent Charlin

Abhranil Chandra

Collaborateur·rice de recherche - University of Waterloo

Maîtrise recherche - Université de Montréal

Juan Duque

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Amr Khalifa

Doctorat - UdeM

Samuel Lavoie

Doctorat - UdeM

Zhixuan Lin

Doctorat - UdeM

Ahmed Masry

Collaborateur·rice de recherche - N/A

Andjela Mladenovic

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Johan Samir Obando Ceron

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche - UdeM

Dereck Piché

Maîtrise recherche - UdeM

Khaled Rouissi

Maîtrise recherche - UdeM

Esra'a Saleh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Glen Berseth

Vedant Shah

Doctorat - UdeM

Doctorat - UdeM

Yusong Wu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Sujin yun

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Yoshua Bengio

Hattie Zhou

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Hugo Larochelle

Publications

Data-Efficient Reinforcement Learning with Momentum Predictive Representations

Rishab Goel

Philip Bachman

While deep reinforcement learning excels at solving tasks where large amounts of data can be collected through virtually unlimited interacti… (voir plus)on with the environment, learning from limited interaction remains a key challenge. We posit that an agent can learn more efficiently if we augment reward maximization with self-supervised objectives based on structure in its visual input and sequential interaction with the environment. Our method, Momentum Predictive Representations (MPR), trains an agent to predict its own latent state representations multiple steps into the future. We compute target representations for future states using an encoder which is an exponential moving average of the agent's parameters, and we make predictions using a learned transition model. On its own, this future prediction objective outperforms prior methods for sample-efficient deep RL from pixels. We further improve performance by adding data augmentation to the future prediction loss, which forces the agent's representations to be consistent across multiple views of an observation. Our full self-supervised objective, which combines future prediction and data augmentation, achieves a median human-normalized score of 0.444 on Atari in a setting limited to 100K steps of environment interaction, which is a 66% relative improvement over the previous state-of-the-art. Moreover, even in this limited data regime, MPR exceeds expert human scores on 6 out of 26 games.

2020-07-12

arXiv.org (prépublication)

dblp.uni-trier.de

A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

Iulian V. Serban

Varun Gupta

Ekaterina Kochmar

Dung D. Vu

Robert Belfer

2020-06-10

Artificial Intelligence in Education (publié)

Gintare Karolina Dziugaite

Stochastic Neural Network with Kronecker Flow

Chin-wei Huang

Ahmed Touati

Pascal Vincent

Alexandre Lacoste

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Detecting semantic anomalies

Faruk Ahmed

We critically appraise the recent interest in out-of-distribution (OOD) detection and question the practical relevance of existing benchmark… (voir plus)s. While the currently prevalent trend is to consider different datasets as OOD, we argue that out-distributions of practical interest are ones where the distinction is semantic in nature for a specified context, and that evaluative tasks should reflect this more closely. Assuming a context of object recognition, we recommend a set of benchmarks, motivated by practical applications. We make progress on these benchmarks by exploring a multi-task learning based approach, showing that auxiliary objectives for improved semantic awareness result in improved semantic anomaly detection, with accompanying generalization benefits.

2020-04-03

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images Using a View-Based Representation

Sai Rajeswar

Fahim Mannan

Florian Golemo

Jérôme Parent-lévesque

David Vázquez

Derek Nowrouzezahrai

2020-03-20

International Journal of Computer Vision (publié)

Out-of-Distribution Generalization via Risk Extrapolation (REx)

David Scott Krueger

Ethan Caballero

Joern-Henrik Jacobsen

Amy Zhang

Jonathan Binas

Rémi LE PRIOL

Generalizing outside of the training distribution is an open challenge for current machine learning systems. A weak form of out-of-distribut… (voir plus)ion (OoD) generalization is the ability to successfully interpolate between multiple observed distributions. One way to achieve this is through robust optimization, which seeks to minimize the worst-case risk over convex combinations of the training distributions. However, a much stronger form of OoD generalization is the ability of models to extrapolate beyond the distributions observed during training. In pursuit of strong OoD generalization, we introduce the principle of Risk Extrapolation (REx). REx can be viewed as encouraging robustness over affine combinations of training risks, by encouraging strict equality between training risks. We show conceptually how this principle enables extrapolation, and demonstrate the effectiveness and scalability of instantiations of REx on various OoD generalization tasks. Our code can be found at this https URL.

2020-03-02

ArXiv (prépublication)

Solving ODE with Universal Flows: Approximation Theory for Flow-Based Models

Chin-wei Huang

Laurent Dinh

Normalizing flows are powerful invertible probabilistic models that can be used to translate two probability distributions, in a way that al… (voir plus)lows us to efficiently track the change of probability density. However, to trade for computational efficiency in sampling and in evaluating the log-density, special parameterization designs have been proposed at the cost of representational expressiveness. In this work, we propose to use ODEs as a framework to establish universal approximation theory for certain families of flow-based models.

2020-02-26

International Conference on Learning Representations (publié)

openreview.net

Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models

Chin-wei Huang

Laurent Dinh

In this work, we propose a new family of generative flows on an augmented data space, with an aim to improve expressivity without drasticall… (voir plus)y increasing the computational cost of sampling and evaluation of a lower bound on the likelihood. Theoretically, we prove the proposed flow can approximate a Hamiltonian ODE as a universal transport map. Empirically, we demonstrate state-of-the-art performance on standard benchmarks of flow-based generative modeling.

2020-02-17

ArXiv (prépublication)

On Bonus Based Exploration Methods In The Arcade Learning Environment

Adrien Ali Taiga

William Fedus

Marlos C. Machado

Marc Gendron-Bellemare

Research on exploration in reinforcement learning, as applied to Atari 2600 game-playing, has emphasized tackling difficult exploration prob… (voir plus)lems such as Montezuma's Revenge (Bellemare et al., 2016). Recently, bonus-based exploration methods, which explore by augmenting the environment reward, have reached above-human average performance on such domains. In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on Montezuma's Revenge, Bellemare et al.'s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. We find that while exploration bonuses lead to higher score on Montezuma's Revenge they do not provide meaningful gains over the simpler epsilon-greedy scheme. In fact, we find that methods that perform best on that game often underperform epsilon-greedy on easy exploration Atari 2600 games. We find that our conclusions remain valid even when hyperparameters are tuned for these easy-exploration games. Finally, we find that none of the methods surveyed benefit from additional training samples (1 billion frames, versus Rainbow's 200 million) on Bellemare et al.'s hard exploration games. Our results suggest that recent gains in Montezuma's Revenge may be better attributed to architecture change, rather than better exploration schemes; and that the real pace of progress in exploration research for Atari 2600 games may have been obfuscated by good results on a single domain.

2020-01-01

ICLR.cc/2020/Conference (poster)

openreview.net

AN ENSEMBLE APPROACH FOR DETECTING MACHINE FAILURE FROM SOUND Technical

Faruk Ahmed

Phong Cao Nguyen

We develop an ensemble-based approach for our submission to the anomaly detection challenge at DCASE 2020. The main members of our ensemble … (voir plus)are auto-encoders (with reconstruction error as the signal), classiﬁers (with negative predictive conﬁdence as the signal), mismatch of the time-shifted signal with its Fourier-phase-shifted version, and a Gaussian mixture model on a set of common short-term features extracted from the waveform. The scores are passed through an exponential non-linearity and weighted to provide the ﬁnal score, where the weighting and scaling hyper-parameters are learned on the development set. Our ensemble improves over the baseline on the development set.

Graph Density-Aware Losses for Novel Compositions in Scene Graph Generation

Boris Knyazev

Harm de Vries

Cătălina Cangea

Graham W. Taylor

Eugene Belilovsky

2020-01-01

Proceedings of the British Machine Vision Conference 2020 (publié)