Aaron Courville

alexandre.ganito@mila.quebec

adrien.taiga@mila.quebec

Alexandre Diz Ganito

Maîtrise recherche - Université de Montréal

Amr Khalifa

Doctorat - Université de Montréal

amr-khalifa.alshaykh@mila.quebec

andrei.nicolicioiu@mila.quebec

Andrei Nicolicioiu

Doctorat - Université de Montréal

ching-lam.choi@mila.quebec

Ankit Vani

Doctorat - Université de Montréal

Baccalauréat - Université de Montréal

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Doctorat - Université de Montréal

Co-superviseur⋅e :

Yoshua Bengio

dinghuai.zhang@mila.quebec

Esra'a Saleh

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Glen Berseth

esraa.saleh@mila.quebec

evgenii.nikishin@mila.quebec

Evgenii Nikishin

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Pierre-Luc Bacon

Faruk Ahmed

Doctorat - Université de Montréal

faruk.ahmed@mila.quebec

Johan Samir Obando Ceron

Doctorat - Université de Montréal

Co-superviseur⋅e :

Pablo Samuel Castro

johan.ceron@mila.quebec

Juan Duque

Doctorat - Université de Montréal

juan.duque@mila.quebec

manoosh.samiei@mila.quebec

Manoosh Samiei

Doctorat - Université de Montréal

Doctorat - Université de Montréal

Co-superviseur⋅e :

mohammed.muqeeth@mila.quebec

schwarzm@mila.quebec

Hattie Zhou

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Hugo Larochelle

mengfei.zhou@mila.quebec

Michael Noukhovitch

Doctorat - Université de Montréal

Collaborateur·rice de recherche

morgane.moss@mila.quebec

Muqeeth Mohammed

Collaborateur·rice de recherche - Université de Montréal

Stagiaire de recherche - Ghent University

pietro.mazzaglia@mila.quebec

Stagiaire de recherche - Université de Montréal

razvan.ciuca@mila.quebec

Rishabh Agarwal

Doctorat - Université de Montréal

Co-superviseur⋅e :

samuel.lavoie@mila.quebec

Samuel Lavoie Marchildon

Doctorat - Université de Montréal

Sarvjeet Ghotra

Maîtrise recherche - Université de Montréal

Superviseur⋅e principal⋅e :

Aishwarya Agrawal

sarvjeet-singh.ghotra@mila.quebec

xiaofeng.zhang@mila.quebec

Arian Hosseini

Doctorat - Université de Montréal

Shawn Tan

Doctorat - Université de Montréal

Doctorat - Université de Montréal

Doctorat - Université de Montréal

Yusong Wu

Doctorat - Université de Montréal

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

wu.yusong@mila.quebec

Zhixuan Lin

Doctorat - Université de Montréal

zhixuan.lin@mila.quebec

Publications

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Rishabh Agarwal

Max Schwarzer

Pablo Samuel Castro

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. M… (voir plus)ost published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field. This work received an outstanding paper award at NeurIPS 2021.

openreview.net

Pretraining Representations for Data-Efficient Reinforcement Learning

Max Schwarzer

Nitarshan Rajkumar

Michael Noukhovitch

Ankesh Anand

Laurent Charlin

(Rex) Devon Hjelm

Philip Bachman

Data efficiency is a key challenge for deep reinforcement learning. We address this problem by using unlabeled data to pretrain an encoder w… (voir plus)hich is then finetuned on a small amount of task-specific data. To encourage learning representations which capture diverse aspects of the underlying MDP, we employ a combination of latent dynamics modelling and unsupervised goal-conditioned RL. When limited to 100k steps of interaction on Atari games (equivalent to two hours of human experience), our approach significantly surpasses prior work combining offline representation pretraining with task-specific finetuning, and compares favourably with other pretraining methods that require orders of magnitude more data. Our approach shows particular promise when combined with larger models as well as more diverse, task-aligned observational data -- approaching human-level performance and data-efficiency on Atari in our best setting.

openreview.net

Explicitly Modeling Syntax in Language Model improves Generalization

Yikang Shen

Shawn Tan

Alessandro Sordoni

Siva Reddy

Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly mod… (voir plus)el syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.

2020-10-21

arXiv.org (prépublication)

dblp.uni-trier.de

A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

Iulian V. Serban

Varun Gupta

Ekaterina Kochmar

Dung D. Vu

Robert Belfer

2020-06-10

Artificial Intelligence in Education (publié)

doi.org

Stochastic Neural Network with Kronecker Flow

Chin-Wei Huang

Ahmed Touati

Gintare Karolina Dziugaite

Alexandre Lacoste

Recent advances in variational inference enable the modelling of highly structured joint distributions, but are limited in their capacity to… (voir plus) scale to the high-dimensional setting of stochastic neural networks. This limitation motivates a need for scalable parameterizations of the noise generation process, in a manner that adequately captures the dependencies among the various parameters. In this work, we address this need and present the Kronecker Flow, a generalization of the Kronecker product to invertible mappings designed for stochastic neural networks. We apply our method to variational Bayesian neural networks on predictive tasks, PAC-Bayes generalization bound estimation, and approximate Thompson sampling in contextual bandits. In all setups, our methods prove to be competitive with existing methods and better than the baselines.

2020-06-03

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Stochastic Neural Network with Kronecker Flow

Chin-Wei Huang

Ahmed Touati

Gintare Karolina Dziugaite

Alexandre Lacoste

2019-06-10

ArXiv (preprint)

Learnable Explicit Density for Continuous Latent Space and Variational Inference

Chin-Wei Huang

Ahmed Touati

Laurent Dinh

Michal Drozdzal

Mohammad Havaei

Laurent Charlin

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its correspon… (voir plus)ding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.

2017-10-06

ArXiv (prépublication)

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-rfou'

Guillaume Alain

Amjad Almahairi

Christof Angermüller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin S. Bayer

A. Belikov

A. Belopolsky

Yoshua Bengio

Arnaud Bergeron

J. Bergstra

Valentin Bisson

Josh Bleecher Snyder

Nicolas Bouchard

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (voir 92 de plus)

pierre luc carrier

Kyunghyun Cho

Jan Chorowski

Paul F. Christiano

Tim Cooijmans

Marc-Alexandre Côté

Myriam Côté

Yann Dauphin

Olivier Delalleau

Julien Demouth

Guillaume Desjardins

Sander Dieleman

Laurent Dinh

M'elanie Ducoffe

Vincent Dumoulin

Samira Ebrahimi Kahou

Dumitru Erhan

Ziye Fan

Orhan Firat

Mathieu Germain

Xavier Glorot

Ian J. Goodfellow

Matthew Graham

Caglar Gulcehre

Philippe Hamel

Iban Harlouchet

Jean-philippe Heng

Balázs Hidasi

Sina Honari

Arjun Jain

S'ebastien Jean

Kai Jia

Mikhail V. Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric P. Larsen

César Laurent

S. Lee

Simon-mark Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

J. Livezey

Cory R. Lorenz

Jeremiah L. Lowin

Qianli M. Ma

Pierre-Antoine Manzagol

Olivier Mastropietro

R. McGibbon

Roland Memisevic

Bart van Merriënboer

Vincent Michalski

Mehdi Mirza

Alberto Orlandi

Chris Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew David Rocklin

Adriana Romero Soriano

Markus Dr. Roth

Peter Sadowski

John Salvatier

Francois Savard

Jan Schlüter

John D. Schulman

Gabriel Schwartz

Iulian V. Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Subramanyam

Jakub Sygnowski

Jérémie Tanguay

Gijs van Tulder

Joseph P. Turian

Sebastian Urban

Francesco Visin

Harm de Vries

David Warde-Farley

Dustin J. Webb

M. Willson

Kelvin Xu

Lijun Xue

Li Yao

Saizheng Zhang

Ying Zhang

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (voir plus)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2016-05-09

ArXiv (preprint)

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-rfou'

Guillaume Alain

Amjad Almahairi

Christof Angermüller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin S. Bayer

A. Belikov

A. Belopolsky

Yoshua Bengio

Arnaud Bergeron

J. Bergstra

Valentin Bisson

Josh Bleecher Snyder

Nicolas Bouchard

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (voir 92 de plus)

pierre luc carrier

Kyunghyun Cho

Jan Chorowski

Paul F. Christiano

Tim Cooijmans

Marc-Alexandre Côté

Myriam Côté

Yann Dauphin

Olivier Delalleau

Julien Demouth

Guillaume Desjardins

Sander Dieleman

Laurent Dinh

M'elanie Ducoffe

Vincent Dumoulin

Samira Ebrahimi Kahou

Dumitru Erhan

Ziye Fan

Orhan Firat

Mathieu Germain

Xavier Glorot

Ian G Goodfellow

Matthew Graham

Caglar Gulcehre

Philippe Hamel

Iban Harlouchet

Jean-philippe Heng

Balázs Hidasi

Sina Honari

Arjun Jain

S'ebastien Jean

Kai Jia

Mikhail V. Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

S. Lee

Simon-mark Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

J. Livezey

Cory R. Lorenz

Jeremiah L. Lowin

Qianli M. Ma

Pierre-Antoine Manzagol

Olivier Mastropietro

R. McGibbon

Roland Memisevic

Bart van Merriënboer

Vincent Michalski

Mehdi Mirza

Alberto Orlandi

Chris Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew David Rocklin

Adriana Romero Soriano

Markus Dr. Roth

Peter Sadowski

John Salvatier

Francois Savard

Jan Schlüter

John D. Schulman

Gabriel Schwartz

Iulian V. Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Subramanyam

Jakub Sygnowski

Jérémie Tanguay

Gijs van Tulder

Joseph P. Turian

Sebastian Urban

Francesco Visin

Harm de Vries

David Warde-Farley

Dustin J. Webb

M. Willson

Kelvin Xu

Lijun Xue

Li Yao

Saizheng Zhang

Ying Zhang

2016-05-09

ArXiv (prépublication)

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Iulian V. Serban

Ryan Lowe

Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterance… (voir plus)s in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.

2016-05-01

ArXiv (prépublication)

doi.org