Aaron Courville

Reza Bayat

PhD - Université de Montréal

Co-supervisor :

Pascal Vincent

Anirudh Buvanesh

PhD - Université de Montréal

Principal supervisor :

Laurent Charlin

Abhranil Chandra

Collaborating researcher - University of Waterloo

Master's Research - Université de Montréal

Juan Duque

PhD - Université de Montréal

PhD - Université de Montréal

Arian Hosseini

PhD - Université de Montréal

Amr Khalifa

PhD - Université de Montréal

Samuel Lavoie

PhD - Université de Montréal

Zhixuan Lin

PhD - Université de Montréal

Ahmed Masry

Collaborating researcher - N/A

Alan Milligan

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Rishabh Agarwal

Andrei Nicolicioiu

PhD - Université de Montréal

Evgenii Nikishin

Collaborating Alumni - Université de Montréal

Principal supervisor :

PhD - Université de Montréal

Johan Samir Obando Ceron

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher - Université de Montréal

Dereck Piché

Master's Research - Université de Montréal

Khaled Rouissi

Master's Research - Université de Montréal

Esra'a Saleh

PhD - Université de Montréal

Principal supervisor :

Glen Berseth

Vedant Shah

PhD - Université de Montréal

PhD - Université de Montréal

Yusong Wu

PhD - Université de Montréal

Principal supervisor :

Anna (Cheng-Zhi) Huang

Sujin yun

PhD - Université de Montréal

Xiaofeng Zhang

PhD - Université de Montréal

Dinghuai Zhang

PhD - Université de Montréal

Co-supervisor :

Yoshua Bengio

Hattie Zhou

PhD - Université de Montréal

Principal supervisor :

Hugo Larochelle

Publications

FiLM: Visual Reasoning with a General Conditioning Layer

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence ne… (see more)ural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

2018-04-29

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

Neural Autoregressive Flows

Chin-wei Huang

David Scott Krueger

Alexandre Lacoste

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via M… (see more)asked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.

2018-04-03

ArXiv (preprint)

Generating Contradictory, Neutral, and Entailing Sentences

Learning distributed sentence representations remains an interesting problem in the field of Natural Language Processing (NLP). We want to l… (see more)earn a model that approximates the conditional latent space over the representations of a logical antecedent of the given statement. In our paper, we propose an approach to generating sentences, conditioned on an input sentence and a logical inference label. We do this by modeling the different possibilities for the output sentence as a distribution over the latent representation, which we train using an adversarial objective. We evaluate the model using two state-of-the-art models for the Recognizing Textual Entailment (RTE) task, and measure the BLEU scores against the actual sentences as a probe for the diversity of sentences produced by our model. The experiment results show that, given our framework, we have clear ways to improve the quality and diversity of generated sentences.

2018-03-07

ArXiv (preprint)

Learning Generative Models with Locally Disentangled Latent Factors

Brady Neal

One of the most successful techniques in generative models has been decomposing a complicated generation task into a series of simpler gener… (see more)ation tasks. For example, generating an image at a low resolution and then learning to refine that into a high resolution image often improves results substantially. Here we explore a novel strategy for decomposing generation for complicated objects in which we first generate latent variables which describe a subset of the observed variables, and then map from these latent variables to the observed space. We show that this allows us to achieve decoupled training of complicated generative models and present both theoretical and experimental results supporting the benefit of such an approach.

2018-02-15

(published)

Inferring Identity Factors for Grouped Examples

Shawn Tan

Chris Pal

We propose a method for modelling groups of face images from the same identity. The model is trained to infer a distribution over the latent… (see more) space for identity given a small set of “training data”. One can then sample images using that latent representation to produce images of the same identity. We demonstrate that the model extracts disentangled factors for identity factors and image-specific vectors. We also perform generative classification over identities to assess its feasibility for few-shot face recognition.

2018-02-12

(published)

Hierarchical Adversarially Learned Inference

Ishmael Belghazi

Sai Rajeswar

We propose a novel hierarchical generative model with a simple Markovian structure and a corresponding inference model. Both the generative … (see more)and inference model are trained using the adversarial learning paradigm. We demonstrate that the hierarchical structure supports the learning of progressively more abstract representations as well as providing semantically meaningful reconstructions with different levels of fidelity. Furthermore, we show that minimizing the Jensen-Shanon divergence between the generative and inference network is enough to minimize the reconstruction error. The resulting semantically meaningful hierarchical latent structure discovery is exemplified on the CelebA dataset. There, we show that the features learned by our model in an unsupervised way outperform the best handcrafted features. Furthermore, the extracted features remain competitive when compared to several recent deep supervised approaches on an attribute prediction task on CelebA. Finally, we leverage the model's inference network to achieve state-of-the-art performance on a semi-supervised variant of the MNIST digit classification task.

2018-02-04

ArXiv (preprint)

HoME: a Household Multimodal Environment

Simon Brodeur

Luca Celotti

Jean Rouat

We introduce HoME: a Household Multimodal Environment for artificial agents to learn from vision, audio, semantics, physics, and interaction… (see more) with objects and other agents, all within a realistic context. HoME integrates over 45,000 diverse 3D house layouts based on the SUNCG dataset, a scale which may facilitate learning, generalization, and transfer. HoME is an open-source, OpenAI Gym-compatible platform extensible to tasks in reinforcement learning, language grounding, sound-based navigation, robotics, multi-agent learning, and more. We hope HoME better enables artificial agents to learn as humans do: in an interactive, multimodal, and richly contextualized setting.

2018-01-01

ICLR (Workshop) (published)

Improving Explorability in Variational Inference with Annealed Variational Objectives

Chin-wei Huang

Shawn Tan

Alexandre Lacoste

Despite the advances in the representational capacity of approximate distributions for variational inference, the optimization process can s… (see more)till limit the density that is ultimately learned. We demonstrate the drawbacks of biasing the true posterior to be unimodal, and introduce Annealed Variational Objectives (AVO) into the training of hierarchical variational methods. Inspired by Annealed Importance Sampling, the proposed method facilitates learning by incorporating energy tempering into the optimization objective. In our experiments, we demonstrate our method's robustness to deterministic warm up, and the benefits of encouraging exploration in the latent space.

Investigating the viability of Generative Models for Novelty Detection

Vidhi Jain

Abstract

Neural Language Modeling by Jointly Learning Syntax and Lexicon

We propose a neural language model capable of unsupervised syntactic structure induction. The model leverages the structure information to f… (see more)orm better semantic representations and better language modeling. Standard recurrent neural networks are limited by their structure and fail to efficiently use syntactic information. On the other hand, tree-structured recursive networks usually require additional structural supervision at the cost of human expert annotation. In this paper, We propose a novel neural language model, called the Parsing-Reading-Predict Networks (PRPN), that can simultaneously induce the syntactic structure from unannotated sentences and leverage the inferred structure to learn a better language model. In our model, the gradient can be directly back-propagated from the language model loss into the neural parsing network. Experiments show that the proposed model can discover the underlying syntactic structure and achieve state-of-the-art performance on word/character-level language model tasks.

2018-01-01

ICLR.cc/2018/Conference (poster)

Towards Text Generation with Adversarially Learned Neural Outlines

Sai Rajeswar

Adam Trischler

Recent progress in deep generative models has been fueled by two paradigms -- autoregressive and adversarial models. We propose a combinatio… (see more)n of both approaches with the goal of learning generative models of text. Our method first produces a high-level sentence outline and then generates words sequentially, conditioning on both the outline and the previous outputs. We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders. This provides strong, informative conditioning for the autoregressive stage. Our quantitative evaluations suggests that conditioning information from generated outlines is able to guide the autoregressive model to produce realistic samples, comparable to maximum-likelihood trained language models, even at high temperatures with multinomial sampling. Qualitative results also demonstrate that this generative procedure yields natural-looking sentences and interpolations.

Bayesian Hypernetworks

David Scott Krueger

Chin-wei Huang

Riashat Islam

Ryan Turner

Alexandre Lacoste

We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, h, is a neura… (see more)l network which learns to transform a simple noise distribution, p(e) = N(0,I), to a distribution q(t) := q(h(e)) over the parameters t of another neural network (the ``primary network). We train q with variational inference, using an invertible h to enable efficient estimation of the variational lower bound on the posterior p(t | D) via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of q(t). In practice, Bayesian hypernets provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.

2017-10-13

ArXiv (preprint)