Publications

Towards Text Generation with Adversarially Learned Neural Outlines

Sandeep Subramanian

Sai Rajeswar

Adam Trischler

Recent progress in deep generative models has been fueled by two paradigms -- autoregressive and adversarial models. We propose a combinatio… (see more)n of both approaches with the goal of learning generative models of text. Our method first produces a high-level sentence outline and then generates words sequentially, conditioning on both the outline and the previous outputs. We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders. This provides strong, informative conditioning for the autoregressive stage. Our quantitative evaluations suggests that conditioning information from generated outlines is able to guide the autoregressive model to produce realistic samples, comparable to maximum-likelihood trained language models, even at high temperatures with multinomial sampling. Qualitative results also demonstrate that this generative procedure yields natural-looking sentences and interpolations.

Trends and Applications in Knowledge Discovery and Data Mining

Lida Rashidi

Benjamin Fung

Can Wang

2018-01-01

Lecture Notes in Computer Science (published)

doi.org

Trends and Applications in Knowledge Discovery and Data Mining

Lida Rashidi

Benjamin Fung

Can Wang

2018-01-01

Lecture Notes in Computer Science (published)

doi.org

Twin Networks: Matching the Future for Sequence Generation

Dmitriy Serdyuk

Nan Rosemary Ke

Alessandro Sordoni

Adam Trischler

Chris Pal

Yoshua Bengio

We propose a simple technique for encouraging generative RNNs to plan ahead. We train a "backward" recurrent network to generate a given seq… (see more)uence in reverse order, and we encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. We hypothesize that our approach eases modeling of long-term dependencies by implicitly forcing the forward states to hold information about the longer-term future (as contained in the backward states). We show empirically that our approach achieves 9% relative improvement for a speech recognition task, and achieves significant improvement on a COCO caption generation task.

2018-01-01

ICLR (Poster) (published)

openreview.net

Universal Successor Representations for Transfer Reinforcement Learning

Chen Ma

Junfeng Wen

Yoshua Bengio

The objective of transfer reinforcement learning is to generalize from a set of previous tasks to unseen new tasks. In this work, we focus o… (see more)n the transfer scenario where the dynamics among tasks are the same, but their goals differ. Although general value function (Sutton et al., 2011) has been shown to be useful for knowledge transfer, learning a universal value function can be challenging in practice. To attack this, we propose (1) to use universal successor representations (USR) to represent the transferable knowledge and (2) a USR approximator (USRA) that can be trained by interacting with the environment. Our experiments show that USR can be effectively applied to new tasks, and the agent initialized by the trained USRA can achieve the goal considerably faster than random initialization.

2018-01-01

ICLR (Workshop) (published)

openreview.net

Dendritic error backpropagation in deep cortical microcircuits

João Sacramento

Rui Ponte Costa

Yoshua Bengio

Walter Senn

Animal behaviour depends on learning to associate sensory stimuli with the desired motor command. Understanding how the brain orchestrates t… (see more)he necessary synaptic modifications across different brain areas has remained a longstanding puzzle. Here, we introduce a multi-area neuronal network model in which synaptic plasticity continuously adapts the network towards a global desired output. In this model synaptic learning is driven by a local dendritic prediction error that arises from a failure to predict the top-down input given the bottom-up activities. Such errors occur at apical dendrites of pyramidal neurons where both long-range excitatory feedback and local inhibitory predictions are integrated. When local inhibition fails to match excitatory feedback an error occurs which triggers plasticity at bottom-up synapses at basal dendrites of the same pyramidal neurons. We demonstrate the learning capabilities of the model in a number of tasks and show that it approximates the classical error backpropagation algorithm. Finally, complementing this cortical circuit with a disinhibitory mechanism enables attention-like stimulus denoising and generation. Our framework makes several experimental predictions on the function of dendritic integration and cortical microcircuits, is consistent with recent observations of cross-area learning, and suggests a biological implementation of deep learning.

2017-12-30

ArXiv (preprint)

arxiv.org

Tensor Regression Networks with various Low-Rank Tensor Approximations

Xingwei Cao

Guillaume Rabusseau

Joelle Pineau

Tensor regression networks achieve high compression rate of neural networks while having slight impact on performances. They do so by imposi… (see more)ng low tensor rank structure on the weight matrices of fully connected layers. In recent years, tensor regression networks have been investigated from the perspective of their compressive power, however, the regularization effect of enforcing low-rank tensor structure has not been investigated enough. We study tensor regression networks using various low-rank tensor approximations, aiming to compare the compressive and regularization power of different low-rank constraints. We evaluate the compressive and regularization performances of the proposed model with both deep and shallow convolutional neural networks. The outcome of our experiment suggests the superiority of Global Average Pooling Layer over Tensor Regression Layer when applied to deep convolutional neural network with CIFAR-10 dataset. On the contrary, shallow convolutional neural networks with tensor regression layer and dropout achieved lower test error than both Global Average Pooling and fully-connected layer with dropout function when trained with a small number of samples.

2017-12-27

ArXiv (preprint)

arxiv.org

Deep Learning @15 Petaflops/second: Semi-supervised pattern detection for 15 Terabytes of climate data

W. Collins

M. Wehner

M. Prabhat

Thorsten Kurth

Nadathur Satish

Ioannis Mitliagkas

Jian Zhang

Evan Racah

Md. Mostofa Ali Patwary

Narayanan Sundaram

Pradeep Dubey

Use machine learning to find energy materials.

Phil De Luna

Jennifer N. Wei

Yoshua Bengio

Al'an Aspuru-guzik

E. Sargent

2017-12-01

Nature (published)

doi.org

Design of a Recognition System Automatic Vehicle License Plate through a Convolution Neural Network

P. Rajendra

K. Sudheer

Rahul Boadh

TE Campos

BR Babu

M. Varma

Ian J Goodfellow

Yoshua Bengio

Aaron

The present work is a study on the practical application of Learning process (Deep Learning) in the development of a system of Automatic rec… (see more)ognition of vehicle license plates. These systems commonly referred to as ALPR (Automatic License Plate Recognition) - are able to recognize the content of vehicles from the images captured by a camera. The system proposed in this work is based on an image classifier developed through supervised learning techniques with convolution neural network. These networks are one of the most profound learning architectures and are specifically designed to solve artificial vision, such as pattern recognition and classification of images. This paper also examines basic processing techniques and Image segmentation - such as smoothing filters, contour detection - necessary for the proposed system to be able to extract the contents of the license plates for further analysis and classification. This paper demonstrates the feasibility of an ALPR system based on a convolution neural network, noting the critical importance it has to design a network architecture and training data set appropriate to the problem to be solved.

2017-11-15

International Journal of Computer Applications (published)

doi.org

Variational Bi-LSTMs

Samira Shabanian

Devansh Arpit

Adam Trischler

Yoshua Bengio

2017-11-15

ArXiv (preprint)

arxiv.org

ACtuAL: Actor-Critic Under Adversarial Learning

Anirudh Goyal

Nan Rosemary Ke

Alex Lamb

Generative Adversarial Networks (GANs) are a powerful framework for deep generative modeling. Posed as a two-player minimax problem, GANs ar… (see more)e typically trained end-to-end on real-valued data and can be used to train a generator of high-dimensional and realistic images. However, a major limitation of GANs is that training relies on passing gradients from the discriminator through the generator via back-propagation. This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function. These difficulties extend to the reinforcement learning setting when the action space is composed of discrete decisions. We address these issues by reframing the GAN framework so that the generator is no longer trained using gradients through the discriminator, but is instead trained using a learned critic in the actor-critic framework with a Temporal Difference (TD) objective. This is a natural fit for sequence modeling and we use it to achieve improvements on language modeling tasks over the standard Teacher-Forcing methods.

2017-11-13

ArXiv (preprint)

arxiv.org

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Publications

Rising to the Occasion

AI Insights for Policymakers

Mila Techaide 2025

The Development of the UN Scientific Panel on AI

Transition in Mila's Scientific Direction

Rising to the Occasion

AI Insights for Policymakers

Popular keywords:

Publications