Aaron Courville

Anirudh Buvanesh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Laurent Charlin

anirudb1102@gmail.com

Razvan Ciuca

Maîtrise recherche - Université de Montréal

Alexandre Diz Ganito

Maîtrise recherche - UdeM

Juan Duque

Doctorat - UdeM

Doctorat - UdeM

Doctorat - UdeM

Uday Kapur

Maîtrise professionnelle - UdeM

Amr Khalifa

Doctorat - UdeM

andrei.nicolicioiu@gmail.com

Samuel Lavoie

Doctorat - UdeM

Zhixuan Lin

Doctorat - UdeM

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Doctorat - UdeM

Co-superviseur⋅e :

Rishabh Agarwal

Andrei Nicolicioiu

Doctorat - UdeM

Michell Mercedes Payano Perez

Google Scholar

Evgenii Nikishin

Collaborateur·rice alumni - UdeM

Superviseur⋅e principal⋅e :

Doctorat - UdeM

Johan Samir Obando Ceron

Doctorat - UdeM

Co-superviseur⋅e :

Stagiaire de recherche - UdeM

Dereck Piché

Maîtrise recherche - UdeM

pichedereck@gmail.com

Esra'a Saleh

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Maîtrise recherche - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Doctorat - UdeM

Superviseur⋅e principal⋅e :

(Rex) Devon Hjelm

Google Scholar

Yusong Wu

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Anna (Cheng-Zhi) Huang

Xiaofeng Zhang

Doctorat - UdeM

Dinghuai Zhang

Doctorat - UdeM

Co-superviseur⋅e :

Yoshua Bengio

Hattie Zhou

Doctorat - UdeM

Superviseur⋅e principal⋅e :

Hugo Larochelle

Publications

Bayesian Hypernetworks

David Scott Krueger

Chin-Wei Huang

Riashat Islam

Ryan Turner

Alexandre Lacoste

2017-10-13

ArXiv (prépublication)

Bayesian Hypernetworks

David Scott Krueger

Chin-Wei Huang

Riashat Islam

Ryan Turner

Alexandre Lacoste

2017-10-13

ArXiv (prépublication)

Learnable Explicit Density for Continuous Latent Space and Variational Inference

Chin-Wei Huang

Ahmed Touati

Laurent Dinh

Michal Drozdzal

Mohammad Havaei

Laurent Charlin

In this paper, we study two aspects of the variational autoencoder (VAE): the prior distribution over the latent variables and its correspon… (voir plus)ding posterior. First, we decompose the learning of VAEs into layerwise density estimation, and argue that having a flexible prior is beneficial to both sample generation and inference. Second, we analyze the family of inverse autoregressive flows (inverse AF) and show that with further improvement, inverse AF could be used as universal approximation to any complicated posterior. Our analysis results in a unified approach to parameterizing a VAE, without the need to restrict ourselves to use factorial Gaussians in the latent real space.

2017-10-06

ArXiv (prépublication)

FiLM: Visual Reasoning with a General Conditioning Layer

Ethan Perez

Florian Strub

Harm de Vries

Vincent Dumoulin

We introduce a general-purpose conditioning method for neural networks called FiLM: Feature-wise Linear Modulation. FiLM layers influence ne… (voir plus)ural network computation via a simple, feature-wise affine transformation based on conditioning information. We show that FiLM layers are highly effective for visual reasoning - answering image-related questions which require a multi-step, high-level process - a task which has proven difficult for standard deep learning methods that do not explicitly model reasoning. Specifically, we show on visual reasoning tasks that FiLM layers 1) halve state-of-the-art error for the CLEVR benchmark, 2) modulate features in a coherent manner, 3) are robust to ablations and architectural modifications, and 4) generalize well to challenging, new data from few examples or even zero-shot.

2017-09-01

ArXiv (prépublication)

End-to-end optimization of goal-driven and visually grounded dialogue systems

Florian Strub

Harm de Vries

Jérémie Mary

Bilal Piot

Olivier Pietquin

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architec… (voir plus)tures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision is too simplistic to render the intrinsic planning problem inherent to dialogue as well as its grounded nature , making the context of a dialogue larger than the sole history. This is why only chitchat and question answering tasks have been addressed so far using end-to-end architectures. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues , based on the policy gradient algorithm. This approach is tested on a dataset of 120k dialogues collected through Mechanical Turk and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

2017-08-19

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (publié)

Adversarial Generation of Natural Language

Sandeep Subramanian

Sai Rajeswar

Francis Dutil

Chris Pal

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for … (voir plus)image generation. Advances in the adversarial generation of natural language from noise however are not commensurate with the progress made in generating images, and still lag far behind likelihood based methods. In this paper, we take a step towards generating natural language with a GAN objective alone. We introduce a simple baseline that addresses the discrete output space problem without relying on gradient estimators and show that it is able to achieve state-of-the-art results on a Chinese poem generation dataset. We present quantitative results on generating sentences from context-free and probabilistic context-free grammars, and qualitative language modeling results. A conditional version is also described that can generate sequences conditioned on sentence characteristics.

2017-08-01

Proceedings of the 2nd Workshop on Representation Learning for NLP (publié)

A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images

David Vazquez

Jorge Bernal

F. Javier Sánchez

Gloria Fernández-Esparrach

Antonio M. López

Adriana Romero Soriano

Michal Drozdzal

Colorectal cancer (CRC) is the third cause of cancer death worldwide. Currently, the standard approach to reduce CRC-related mortality is to… (voir plus) perform regular screening in search for polyps and colonoscopy is the screening tool of choice. The main limitations of this screening procedure are polyp miss rate and the inability to perform visual assessment of polyp malignancy. These drawbacks can be reduced by designing decision support systems (DSS) aiming to help clinicians in the different stages of the procedure by providing endoluminal scene segmentation. Thus, in this paper, we introduce an extended benchmark of colonoscopy image segmentation, with the hope of establishing a new strong benchmark for colonoscopy image analysis research. The proposed dataset consists of 4 relevant classes to inspect the endoluminal scene, targeting different clinical needs. Together with the dataset and taking advantage of advances in semantic segmentation literature, we provide new baselines by training standard fully convolutional networks (FCNs). We perform a comparative study to show that FCNs significantly outperform, without any further postprocessing, prior results in endoluminal scene segmentation, especially with respect to polyp segmentation and localization.

2017-07-26

Journal of Healthcare Engineering (publié)

Self-organized Hierarchical Softmax

Yikang Shen

Shawn Tan

Chris Pal

We propose a new self-organizing hierarchical softmax formulation for neural-network-based language models over large vocabularies. Instead … (voir plus)of using a predefined hierarchical structure, our approach is capable of learning word clusters with clear syntactical and semantic meaning during the language model training process. We provide experiments on standard benchmarks for language modeling and sentence compression tasks. We find that this approach is as fast as other efficient softmax approximations, while achieving comparable or even better performance relative to similar full softmax models.

2017-07-26

ArXiv (prépublication)

A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering

Tegan Maharaj

Nicolas Ballas

Anna Rohrbach

Chris Pal

While deep convolutional neural networks frequently approach or exceed human-level performance in benchmark tasks involving static images, e… (voir plus)xtending this success to moving images is not straightforward. Video understanding is of interest for many applications, including content recommendation, prediction, summarization, event/object detection, and understanding human visual perception. However, many domains lack sufficient data to explore and perfect video models. In order to address the need for a simple, quantitative benchmark for developing and understanding video, we present MovieFIB, a fill-in-the-blank question-answering dataset with over 300,000 examples, based on descriptive video annotations for the visually impaired. In addition to presenting statistics and a description of the dataset, we perform a detailed analysis of 5 different models predictions, and compare these with human performance. We investigate the relative importance of language, static (2D) visual features, and moving (3D) visual features, the effects of increasing dataset size, the number of frames sampled, and of vocabulary size. We illustrate that: this task is not solvable by a language model alone, our model combining 2D and 3D visual information indeed provides the best result, all models perform significantly worse than human-level. We provide human evaluation for responses given by different models and find that accuracy on the MovieFIB evaluation corresponds well with human judgment. We suggest avenues for improving video models, and hope that the MovieFIB challenge can be useful for measuring and encouraging progress in this very interesting field.

2017-07-21

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (publié)

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Harm de Vries

Florian Strub

Sarath Chandar

Olivier Pietquin

Hugo Larochelle

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The… (voir plus) goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection of a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images. We explain our design decisions in collecting the dataset and introduce the oracle and questioner tasks that are associated with the two players of the game. We prototyped deep learning models to establish initial baselines of the introduced tasks.

2017-07-21

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (publié)

A Closer Look at Memorization in Deep Networks

Devansh Arpit

Stanisław Jastrzębski

Nicolas Ballas

Maxinder S. Kanwal

Asja Fischer

We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While dee… (voir plus)p networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.

2017-07-17

Proceedings of the 34th International Conference on Machine Learning (publié)

proceedings.mlr.press

Learning Visual Reasoning Without Strong Priors

Ethan Perez

Harm de Vries

Florian Strub

Vincent Dumoulin

Achieving artificial visual reasoning - the ability to answer image-related questions which require a multi-step, high-level process - is an… (voir plus) important step towards artificial general intelligence. This multi-modal task requires learning a question-dependent, structured reasoning process over images from language. Standard deep learning approaches tend to exploit biases in the data rather than learn this underlying structure, while leading methods learn to visually reason successfully but are hand-crafted for reasoning. We show that a general-purpose, Conditional Batch Normalization approach achieves state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4% error rate. We outperform the next best end-to-end method (4.5%) and even methods that use extra supervision (3.1%). We probe our model to shed light on how it reasons, showing it has learned a question-dependent, multi-step process. Previous work has operated under the assumption that visual reasoning calls for a specialized architecture, but we show that a general architecture with proper conditioning can learn to visually reason effectively.

2017-07-10

ArXiv (prépublication)