Aaron Courville

Alan Alan

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

Laurent Charlin

Juan Duque

PhD - Université de Montréal

PhD - Université de Montréal

Arian Hosseini

PhD - Université de Montréal

Uday Kapur

PhD - Université de Montréal

Amr Khalifa

PhD - Université de Montréal

Samuel Lavoie

PhD - Université de Montréal

Zhixuan Lin

PhD - Université de Montréal

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Andrei Nicolicioiu

PhD - Université de Montréal

Michael Noukhovitch

PhD - Université de Montréal

Johan Samir Obando Ceron

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Dereck Piché

PhD - Université de Montréal

Khaled Rouissi

Master's Research - Université de Montréal

Esra'a Saleh

PhD - Université de Montréal

Principal supervisor :

Glen Berseth

Vedant Shah

PhD - Université de Montréal

PhD - Université de Montréal

Yusong Wu

PhD - Université de Montréal

Principal supervisor :

Anna (Cheng-Zhi) Huang

Sujin yun

PhD - Université de Montréal

Xiaofeng Zhang

PhD - Université de Montréal

Publications

On the Learning Dynamics of Deep Neural Networks

Remi Tachet des Combes

Mohammad Pezeshki

Samira Shabanian

Yoshua Bengio

While a lot of progress has been made in recent years, the dynamics of learning in deep nonlinear neural networks remain to this day largely… (see more) misunderstood. In this work, we study the case of binary classification and prove various properties of learning in such networks under strong assumptions such as linear separability of the data. Extending existing results from the linear case, we confirm empirical observations by proving that the classification error also follows a sigmoidal shape in nonlinear architectures. We show that given proper initialization, learning expounds parallel independent modes and that certain regions of parameter space might lead to failed training. We also demonstrate that input norm and features' frequency in the dataset lead to distinct convergence speeds which might shed some light on the generalization capabilities of deep neural networks. We provide a comparison between the dynamics of learning with cross-entropy and hinge losses, which could prove useful to understand recent progress in the training of generative adversarial networks. Finally, we identify a phenomenon that we baptize \textit{gradient starvation} where the most frequent features in a dataset prevent the learning of other less frequent but equally informative features.

2018-09-17

ArXiv (preprint)

Approximate Exploration through State Abstraction

Adrien Ali Taiga

Bellemare Marc-Emmanuel

Although exploration in reinforcement learning is well understood from a theoretical point of view, provably correct methods remain impracti… (see more)cal. In this paper we study the interplay between exploration and approximation, what we call approximate exploration. Our main goal is to further our theoretical understanding of pseudo-count based exploration bonuses (Bellemare et al., 2016), a practical exploration scheme based on density modelling. As a warm-up, we quantify the performance of an exploration algorithm, MBIE-EB (Strehl and Littman, 2008), when explicitly combined with state aggregation. This allows us to confirm that, as might be expected, approximation allows the agent to trade off between learning speed and quality of the learned policy. Next, we show how a given density model can be related to an abstraction and that the corresponding pseudo-count bonus can act as a substitute in MBIE-EB combined with this abstraction, but may lead to either under- or over-exploration. Then, we show that a given density model also defines an implicit abstraction, and find a surprising mismatch between pseudo-counts derived either implicitly or explicitly. Finally we derive a new pseudo-count bonus alleviating this issue.

2018-08-28

ArXiv (preprint)

Feature-wise transformations

Harm Vries

2018-07-08

Distill (published)

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

Amjad Almahairi

Sai Rajeswar

Alessandro Sordoni

Philip Bachman

Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by red… (see more)ucing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.

2018-07-02

Proceedings of the 35th International Conference on Machine Learning (published)

Mutual Information Neural Estimation

Mohamed Ishmael Belghazi

Sai Rajeswar

R Devon Hjelm

We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent … (see more)over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.

2018-07-02

International Conference on Machine Learning (unknown)

Neural Autoregressive Flows

Chin-wei Huang

David Krueger

Alexandre Lacoste

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via M… (see more)asked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.

2018-07-02

Proceedings of the 35th International Conference on Machine Learning (published)

Straight to the Tree: Constituency Parsing with Neural Syntactic Distance

In this work, we propose a novel constituency parsing scheme. The model predicts a vector of real-valued scalars, named syntactic distances,… (see more) for each split position in the input sentence. The syntactic distances specify the order in which the split points will be selected, recursively partitioning the input, in a top-down fashion. Compared to traditional shift-reduce parsing schemes, our approach is free from the potential problem of compounding errors, while being faster and easier to parallelize. Our model achieves competitive performance amongst single model, discriminative parsers in the PTB dataset and outperforms previous models in the CTB dataset.

2018-06-30

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

On the Spectral Bias of Deep Neural Networks

Nasim Rahaman

Felix Draxler

Fred Hamprecht

It is well known that over-parametrized deep neural networks (DNNs) are an overly expressive class of functions that can memorize even rando… (see more)m data with

2018-06-21

arXiv.org (preprint)

dblp.uni-trier.de

Manifold Mixup: Better Representations by Interpolating Hidden States

Amir Najafi

David Lopez-Paz

Deep neural networks excel at learning the training data, but often provide incorrect and confident predictions when evaluated on slightly d… (see more)ifferent test examples. This includes distribution shifts, outliers, and adversarial examples. To address these issues, we propose Manifold Mixup, a simple regularizer that encourages neural networks to predict less confidently on interpolations of hidden representations. Manifold Mixup leverages semantic interpolations as additional training signal, obtaining neural networks with smoother decision boundaries at multiple levels of representation. As a result, neural networks trained with Manifold Mixup learn class-representations with fewer directions of variance. We prove theory on why this flattening happens under ideal conditions, validate it on practical situations, and connect it to previous works on information theory and generalization. In spite of incurring no significant computation and being implemented in a few lines of code, Manifold Mixup improves strong baselines in supervised learning, robustness to single-step adversarial attacks, and test log-likelihood.

2018-06-12

arXiv (preprint)

MINE: Mutual Information Neural Estimation

Ishmael Belghazi

Sai Rajeswar

Aristide Baratin

R Devon Hjelm

This paper presents a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size. MINE… (see more) is back-propable and we prove that it is strongly consistent. We illustrate a handful of applications in which MINE is succesfully applied to enhance the property of generative models in both unsupervised and supervised settings. We apply our framework to estimate the information bottleneck, and apply it in tasks related to supervised classification problems. Our results demonstrate substantial added flexibility and improvement in these settings.

2018-05-03

ICLR.cc/2018/Conference (unknown)

openreview.net

Learning Visual Reasoning Without Strong Priors

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence ne… (see more)ural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

2018-04-28

AAAI Conference on Artificial Intelligence (published)

Generating Contradictory, Neutral, and Entailing Sentences

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to l… (see more)earn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.

2018-03-06

ArXiv (preprint)