Aaron Courville

Anirudh Buvanesh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Laurent Charlin

Razvan Ciuca

Maîtrise recherche - Université de Montréal

Juan Duque

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Uday Kapur

Doctorat - UdeM

Amr Khalifa

Doctorat - UdeM

Samuel Lavoie

Doctorat - UdeM

Zhixuan Lin

Doctorat - UdeM

Google Scholar

Ahmed Masry

Collaborateur·rice de recherche - N/A

Google Scholar

Alan Milligan

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Doctorat - UdeM

Doctorat - UdeM

Johan Samir Obando Ceron

Doctorat - UdeM

Co-superviseur⋅e :

Collaborateur·rice de recherche - UdeM

Dereck Piché

Maîtrise recherche - UdeM

Khaled Rouissi

Maîtrise recherche - UdeM

Esra'a Saleh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Glen Berseth

Google Scholar

Vedant Shah

Doctorat - UdeM

Doctorat - UdeM

Yusong Wu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Sujin yun

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Yoshua Bengio

Hattie Zhou

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Hugo Larochelle

Publications

A Closer Look at Memorization in Deep Networks

Devansh Arpit

Stanisław Jastrzębski

Maxinder S. Kanwal

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (voir plus)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

2017-07-17

Proceedings of the 34th International Conference on Machine Learning (publié)

proceedings.mlr.press

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an… (voir plus) important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

2017-07-10

ArXiv (prépublication)

Modulating early visual processing by language

Jérémie Mary

Olivier Pietquin

It is commonly assumed that language refers to high-level visual concepts while leaving low-level visual processing unaffected. This view do… (voir plus)minates the current literature in computational models for language-vision tasks, where visual and linguistic input are mostly processed independently before being fused into a single representation. In this paper, we deviate from this classic pipeline and propose to modulate the \emph{entire visual processing} by linguistic input. Specifically, we condition the batch normalization parameters of a pretrained residual network (ResNet) on a language embedding. This approach, which we call MOdulated RESnet (\MRN), significantly improves strong baselines on two visual question answering tasks. Our ablation study shows that modulating from the early stages of the visual processing is beneficial.

2017-07-02

ArXiv (prépublication)

A Closer Look at Memorization in Deep Networks

Devansh Arpit

Stanisław Jastrzębski

Maxinder S. Kanwal

2017-06-16

ArXiv (prépublication)

Multi-modal Variational Encoder-Decoders

Iulian V. Serban

Alexander G. Ororbia II

Joelle Pineau

Aaron Courville

Recent advances in neural variational inference have facilitated efficient training of powerful directed graphical models with continuous la… (voir plus)tent variables, such as variational autoencoders. However, these models usually assume simple, uni-modal priors — such as the multivariate Gaussian distribution — yet many real-world data distributions are highly complex and multi-modal. Examples of complex and multi-modal distributions range from topics in newswire text to conversational dialogue responses. When such latent variable models are applied to these domains, the restriction of the simple, uni-modal prior hinders the overall expressivity of the learned model as it cannot possibly capture more complex aspects of the data distribution. To overcome this critical restriction, we propose a flexible, simple prior distribution which can be learned efficiently and potentially capture an exponential number of modes of a target distribution. We develop the multi-modal variational encoder-decoder framework and investigate the effectiveness of the proposed prior in several natural language processing modeling tasks, including document modeling and dialogue modeling.

2017-04-24

arXiv.org (prépublication)

openreview.net

Improved Training of Wasserstein GANs

Generative Adversarial Networks (GANs) are powerful generative models, but suffer from training instability. The recently proposed Wasserste… (voir plus)in GAN (WGAN) makes progress toward stable training of GANs, but sometimes can still generate only low-quality samples or fail to converge. We find that these problems are often due to the use of weight clipping in WGAN to enforce a Lipschitz constraint on the critic, which can lead to undesired behavior. We propose an alternative to clipping weights: penalize the norm of gradient of the critic with respect to its input. Our proposed method performs better than standard WGAN and enables stable training of a wide variety of GAN architectures with almost no hyperparameter tuning, including 101-layer ResNets and language models over discrete data. We also achieve high quality generations on CIFAR-10 and LSUN bedrooms.

2017-03-31

ArXiv (prépublication)

Char2Wav: End-to-End Speech Synthesis

Jose Sotelo

We present Char2Wav, an end-to-end model for speech synthesis. Char2Wav has two components: a reader and a neural vocoder . The reader is an… (voir plus) encoder-decoder model with attention. The encoder is a bidirectional recurrent neural network that accepts text or phonemes as inputs, while the decoder is a recurrent neural network (RNN) with attention that produces vocoder acoustic features. Neural vocoder refers to a conditional extension of SampleRNN which generates raw waveform samples from intermediate representations. Unlike traditional models for speech synthesis, Char2Wav learns to produce audio directly from text

2017-02-17

International Conference on Learning Representations (publié)

openreview.net

Deep Nets Don't Learn via Memorization

David Scott Krueger

Nicolas Ballas

Stanisław Jastrzębski

Maxinder S. Kanwal

We use empirical methods to argue that deep neural networks (DNNs) do not achieve their performance by memorizing training data in spite of … (voir plus)overlyexpressive model architectures. Instead, they learn a simple available hypothesis that fits the finite data samples. In support of this view, we establish that there are qualitative differences when learning noise vs. natural datasets, showing: (1) more capacity is needed to fit noise, (2) time to convergence is longer for random labels, but shorter for random inputs, and (3) that DNNs trained on real data examples learn simpler functions than when trained with noise data, as measured by the sharpness of the loss function at convergence. Finally, we demonstrate that for appropriately tuned explicit regularization, e.g. dropout, we can degrade DNN training performance on noise datasets without compromising generalization on real data.

2017-02-17

International Conference on Learning Representations (publié)

dblp.uni-trier.de

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

Iulian V. Serban

Sequential data often possesses hierarchical structures with complex dependencies between sub-sequences, such as found between the utterance… (voir plus)s in a dialogue. To model these dependencies in a generative framework, we propose a neural network-based generative architecture, with stochastic latent variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with other recent neural-network architectures. We evaluate the model performance through a human evaluation study. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate both the generation of meaningful, long and diverse responses and maintaining dialogue state.

2017-02-12

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation

Iulian V. Serban

Tim Klinger

Gerald Tesauro

Kartik Talamadupula

Bowen Zhou

Yoshua Bengio

Aaron Courville

We introduce a new class of models called multiresolution recurrent neural networks, which explicitly model natural language generation at m… (voir plus)ultiple levels of abstraction. The models extend the sequence-to-sequence framework to generate two parallel stochastic processes: a sequence of high-level coarse tokens, and a sequence of natural language words (e.g. sentences). The coarse sequences follow a latent stochastic process with a factorial representation, which helps the models generalize to new examples. The coarse sequences can also incorporate task-specific knowledge, when available. In our experiments, the coarse sequences are extracted using automatic procedures, which are designed to capture compositional structure and semantics. These procedures enable training the multiresolution recurrent neural networks by maximizing the exact joint log-likelihood over both sequences. We apply the models to dialogue response generation in the technical support domain and compare them with several competing models. The multiresolution recurrent neural networks outperform competing models by a substantial margin, achieving state-of-the-art results according to both a human evaluation study and automatic evaluation metrics. Furthermore, experiments show the proposed models generate more fluent, relevant and goal-oriented responses.

2017-02-12

Proceedings of the AAAI Conference on Artificial Intelligence (publié)

doi.org

Movie Description

Anna Rohrbach

Atousa Torabi

Marcus Rohrbach

Niket Tandon

Chris Pal

Hugo Larochelle

Aaron Courville

Bernt Schiele

2017-01-25

International Journal of Computer Vision (publié)

doi.org